Skip to main content

Data Analytics with Open Source Tools

A long time data wrangler serving many masters as one must in this role, I have been looking for a book that talked about the real life challenges of the job. I would love some practical advice on how to do my job better without driving myself completely crazy.
I found at least some of that in Philipp K. Janert’s book Data Analytics with Open Source Tools. I am not the right audience for the math in the book and based on my experience translating something that technical to executive management would be extremely challenging if not impossible. Often there are no serious math nerds on the team that understand the concerns of the business well enough to bring their numerical and computations skills to bear on them effectively (i.e. three action items to improve customer engagement by 15% in the next 90 days).
More often than not, it is falls on the rest of us who straddle the technical and business worlds, to divine (or help divine) something of value from the many cesspools of enterprise data. To be successful, we to know how to make the most of what little we have in terms of clean data, repeatable processes, inertia to improving them and a common understanding of data across the enterprise.
In the preface and introduction of his book, Janert advocates using as little statistics as possible, going with the most commonsense way to analyze the data set and get a feel for it just by looking at it. Slice and dice it many ways, run some charts and numbers to see if there is an interesting story buried there somewhere. This is been my approach almost 90% of the time and I was excited to see it endorsed by the author. I have used what the math yielded as a way to prove or disprove my story. While far from perfect, the method has helped point clients in the right direction, remedy issues that would have otherwise gone undiscovered.
Later in the book, the author brings up a very important point. Getting data to be good enough is often feasible but to get it to be truly high quality maybe an impossible task. If the success of a project hinges completely on the data being better than good enough, it may be wiser not to take on the project at all. This is excellent advice that I will remember to pass on to clients who are bent on cleaning the Augean stables in their quest for business intelligence nirvana.
I would definitely refer this book again if my job ever required me to do the math on data instead of analyzing it using the far less rigorous techniques that most shops are content to use. However, I will continue to look for a cookbook for the analyst who has to work within constraints of time, poor data quality and lack of cohesive processes that are the sources of data. Ideally, this book will have case studies, problem scenarios and real-life solutions that folks like myself can relate to and apply on our own jobs.

Comments

Popular posts from this blog

Part Liberated Woman

An expat desi friend and I were discussing what it means to return to India when you have cobbled together a life in a foreign country no matter how flawed and imperfect. We have both spent over a decade outside India and have kids who were born abroad and have spent very little time back home. Returning "home" is something a lot of new immigrants like L and myself think about. We want very much for that to be an option because a full assimilation into our country of domicile is likely never going to happen. L has visited India more often than I have and has a much better pulse on what's going on there. For me the strongest drag force working against my desire to return home is my experience of life as a woman in India. I neither want to live that suffocatingly sheltered existence myself nor subject J to it. The freedom, independence and safety I have had in here in suburban America was not even something I knew I could expect to have in India. I never knew what it felt t...

Under Advisement

Recently a desi dude who is more acquaintance less friend called to check in on me. Those who have read this blog before might know that such calls tend to make me anxious. Depending on how far back we go, there are sets of FAQs that I brace myself to answer. The trick is to be sufficiently evasive without being downright offensive - a fine balancing act given the provocative nature of questions involved. I look at these calls as opportunities for building patience and tolerance both of which I seriously lack. Basically, they are very desirous of finding out how I am doing in my personal and professional life to be sure that they have me correctly categorized and filed for future reference. The major buckets appear to be loser, struggling, average, arrived, superstar and uncategorizable. My goal needless to say, is to be in the last bucket - the unknown, unquantifiable and therefore uninteresting entity. Their aim is to pull me into something more tangible. So anyways, the dude in ques...

Changing Pace

This blog has been a big part of my life for the last five years. Besides giving me the opportunity to connect with a number of interesting people and share my thoughts and ideas with them, it has been a form of daily meditation for me. No matter what the day threw my way, I made a very deliberate effort to find a little quiet time to write.The process of thinking about what to write and then the act of writing itself worked as an antidote to aggravations big and small. Five and half years ago, when I started Heartcrossings both my personal and professional lives left a lot to be desired for. The only real happiness I had was in being J's mother. While that was often enough to make me forget what I did not have, I sorely needed a third place to call my own and shape in the likeness of my dreams. This blog has been where there were no limits or constraints and that was absolutely exhilarating - it is the reason I have been able to nurture it for as long and as much as I have. A lot ...