As I’ve mentioned in previous posts, many of the references one will encounter when looking
up methods for dealing with missing values will be oriented towards statistical inference
and obtaining ubiased estimates of population parameters, such as means, variances, and
covariances. The most mentioned of these techniques is multiple imputation. I saw value in
digging deeper into this area in general, despite it not being optimized reading for
developing predictive models – especially those that might run in real time in an app or at a
clinic of some sort. The reason is that, in tandem with developing a great predictive model,
I generally like to develop corresponding models that focus on interpretability. This allows me
to learn from both inferential and predictive approaches, and to deploy the predictive model while
using the interpretable model to help explain and understand the predictions. However, once one
becomes interested in interpretability, one becomes interested in inference – or, importantly,
unbiased estimates of population parameters, etc. That is, I’m actually very interested in
unbiased estimates of means, variances, and covariances – but in parallel to prediction, not
in place of it.