Predictive Analytics as Science & Art Fusion
I have been working in Analytics Domain for the last many years and in this blog, I am sharing my experience of learning Predictive Analytics. In my career, I would have built 100+ models and frankly, I have found the approach to analytics As a fusion of science & art has worked for me in delivering robust models with very good lifts.
From the day I stepped into the shoes of an analyst, I realized that Analytics is a fusion of science and art. The science element of analytics is well encapsulated in mathematical algorithms, thanks to the various commercial and open-source statistical tools. Moreover, for many of the modeling techniques like Logistic Regression, Decision Tree, etc there are well-defined processes/steps and well-defined statistics to be observed. The key question was – Where do I start? To begin with, I decided that I will understand the science part – the statistics and its interpretation. I was very clear to not to try remembering the formulae, you can always google it; the important aspect is conceptual understanding and interpretation of that statistic is what one should carry (see my blog on “Information Value Concept in Scorecard Development” to understand what I mean by conceptual understanding and interpretation of stats)
Early on, I asked myself – If statistical calculations, a permissible range for various stats, the modeling process, are all well defined and templatized, then what would differentiate a good modeler? Why good analysts are paid a premium?
The answer to me was – It is their art of modeling. It is their ability to interpret the numbers.
To reinforce the above, let us do some analogical reasoning:
What makes a craftsman, a good craftsman?
What makes a pot-maker, a good pot-maker?
What makes a painter, a good painter?
The point I learned was: Anyone in their respective profession, making use of similar tools & techniques can create differentiation by his art, dexterity, skill in using the respective tools & techniques.
If modeling is an art, then the question is “How do I develop this art?”
The answer lies in the modeling process itself. Hypothesis Testing.
Create Hypothesis: One of the key steps in the modeling process is Hypothesis Testing. Unfortunately, I have seen many analysts considering this most important step as a ritual to be performed. I was and I still am particular of writing down my hypothesis. Without complicating me in the jargon of Null Hypothesis / Alternate Hypothesis, I simply consider the hypothesis as a subjective opinion of what you feel the trend/pattern would be.
Validate Hypothesis: The next step is obviously to validate the hypothesis using statistical technique and more important by creating a visualization (visual depiction) of the data for your hypothesis; simply put the bad-rate / good-rate graph or the cross-tabs that are often created in most of the modeling process.
Interpret Hypothesis: Interpret the output of visualization; make your own takeaway from the charts and graphs. Do not leave this job to others, at least in the initial years of your career.
Interpretation will build your knowledge; some of these hypotheses will get repeatedly validated and they will become part of your memory stack as wisdom (domain knowledge).
With this domain knowledge, I developed my own approach to structure my hypotheses and the variables which are likely to come in the model. I developed my own approach which I call the “NIL” (Need-Involvement-Lifestage (or Lifestyle)) approach and I have used it successfully for the Banking Domain. Let me explain this with cross-sell example from Banking:
For each predictor variable, you then write your hypothesis and try to relatively rank them with your intuition. My “NIL” approach with time has not left me nil… it has helped me develop a good amount of domain knowledge.
It is important that with experience you develop your intuitive approach for clustering the variables and say which variable helps you bring in what dimension. With time you should be able to say the variables which are more likely to be multi-collinear, variables which are more likely to have come as strong predictors, what binning and transformations you may have to apply, etc. All of the above will improve your domain knowledge (art) and you can reach a stage where you may be confidently able to say the top 5 variables with which you can build the final model. Having said that, never do away with the statistical processes and iterations that need to be followed. The statistic part will always be still important as it will help you quantitatively prove your hypothesis and at times throw newer insights.
This constant fusion of science and art is how I learned predictive analytics, whatsoever little; and I continue to remain in the learning phase….
Sign-Off Note: The science part of predictive modeling is a well-established fact. As an analyst, we need to prove our art!
PS: Our Next Data Science Certification Program
Recent Comments