Friday, July 10, 2009

Truth And Hypothesis

A July 2008 Wired cover story titled The End of Science, resonates with my own experience with data and analytics - specially in light of the battles that one has to wage with those who are convinced that there a priori needs to be a statistical model to proceed with analytics. Apparently, analysis means nothing without a hypothesis. I believe with the right tools, the data should speak for itself - the models are not necessary.

This is however a hard case to make for someone (like myself) who is approaching the problem from a common-sense and industry experience standpoint without a string of accreditations and doctorates appended to their name to lend gravitas to their perspective. Folks like us don't earn the respect of the pure math nerds very easily. They tend to view us as slackers who want to bypass the rigors of the scientific method. So we soldier on, hobbled by models and hypothesis that are not even directionally consistent with what the data says about itself.

It was heartening to see this idea echoed by Google's research director Peter Norvig who says of statistical models "All models are wrong, and increasingly you can succeed without them". The skills and talents of statisticians are much better served in finding the dominant patterns in sets of correlated data. They can help whittle the field from 100s of X's that result in a Y to a handful of the most high-value ones. Forcing a model on data is almost detrimental in that you can spend the bulk of your time disproving the hypothesis and search for one that will stick. What is more, you could miss out on the real story the data has to tell.

No comments: