The value of statistical models cannot be underestimated. The ability to predict specific health outcomes with a certain level of confidence in public health is invaluable. From informing better resource allocation to developing the relevant programmatic interventions to ensuring patients receive the differentiated care they require; models are at the heart of data analysis.
Now, depending on whether you ask a traditional statistician or a data scientist about how to go about building a model, you may get some contrasting views. While they may both agree on the popular aphorism “all models are wrong, but some are useful,” they will often differ on what the correct practical steps are in delivering a final predictive model. And it’s in better understanding some of these contrasting approaches that the so-called “old school” and “new school”, in terms of modelling, can be bridged together rather than remaining on opposite sides of the fence.
In this article, we’ll compare aspects of traditional statistical modelling to the capabilities of machine learning. We’ll also explore how machine learning techniques can be harnessed to build more useful models.
Model building fundamentals
A model, by definition, is a mathematical expression that encompasses a set of explanatory variables (‘features’) of interest which, given certain statistical assumptions and parameters, are used to predict a specific outcome. For example, a model predicting viral load in HIV+ patients may have several explanatory variables of interest including age, sex, adherence to treatment, and duration of treatment regimen. These variables or inputs are then used to predict a specific outcome or output, in this case viral load.