Wednesday, March 24, 2010

Data Wrangling


Until a few months ago, a big part of my day job involved data acquisition for analytics and statistical modeling. The math nerds got to have all the fun, while folks like myself toiled for weeks and months to gather the virtually un-gatherable and implement "repeatable" processes that could introduce some method to the madness. After a six months of trying to stabilize the process by which data would be sourced, cleaned and prepared for use, we had to give up on attaining the nirvana state of data availability near real time, all the time with guaranteed quality.

I can see why there may be viable a marketplace for buying and selling data. Data accquisition when done in-house can become an all consuming job. When you cannot trust the interim repositories where data is aggregated, you need to return to source for each of them. What starts of looking like a reasonable amount of work soon blossoms into this hydra headed monster that simply cannot be reined in. I would gladly trade all the pain and pointlessless of attempting to get it right and keeping it that way over time, for a per per use model for buying good, clean, ready to use data.

No comments: