People often get caught up in the more-is-better mentality when it comes to data. Of course, larger quantities are usually better, and time-series and panel data is superior to cross-sectional data. But that’s not what I’m talking about here.
When I refer to data wants versus data needs I refer to the actual variables you’re collecting.
After you’ve established a set of business objectives, and you’ve gone to the trouble of quantifying those objectives in a set of goals the next step is to spend some time figuring out the variables you need in order to assess how you’re doing against your goals. I basically see two approaches at this junction:
Throwing the kitchen sink at the problem usually masks one of two things:
It’s easy to become greedy and want to collect everything under the sun. This also reflects laziness.
The better approach is to be deliberate. Taking the time to think through your business objectives and goals, and how the metrics you choose relate to those objectives and goals, will help you provide real, useful insights. It will prevent garbage in—garbage out.
Of course, to be able to do this, you need to have a real understanding of the problem at hand.
After establishing a good understanding of the problem domain, we need to select the appropriate metrics. Keeping in mind that less is (usually) more, it’s worth spending some time figuring out which metrics truly matter.
Is it possible that some of the things we want can be derived from simpler variables? Being surgical about the things we collect doesn’t preclude us from running expansive and nuanced analysis. It also helps us remain friends with the guys actually implementing the process, and it reduces costs.
Fundamentally, it comes down to letting ourselves be guided by Occam’s Razor, despite the temptations to go hog wild in the candy store that can be Big Data.