Beware the Overfit Trap in Data Analysis

Here is another valuable Management Tip of the Day from Harvard Business Review. To sign up for a free subscription to any/all HBR newsletters, please click here.

* * *

It can be exciting when your data analysis suggests a surprising or counterintuitive prediction. But the result might be due to overfitting, which occurs when a statistical model describes random noise rather than the underlying relationship you need to capture. You can guard against this trap by keeping your analysis simple.

o Be on guard against spurious correlations, and look for relationships that measure important effects related to clear, logical hypotheses.

o Test for overfitting by randomly dividing the data into a training set, with which you’ll estimate the model, and a validation set, with which you’ll test the accuracy of the model’s predictions.

o An overfit model might be great at making predictions within the training set but raise warning flags by performing poorly in the validation set.

o You might also consider alternative narratives: Is there another story you could tell with the same data?

o If so, you cannot be confident that the relationship you have uncovered is the right — or only — one.

This Tip was adapted from the HBR Guide to Data Analytics Basics for Managers.

To check out that HBR book and join the discussion, please click here.

Also, you may wish to check out an anthology, Management Tips from Harvard Business Review, by clicking here.

Posted in

Leave a Comment





This site uses Akismet to reduce spam. Learn how your comment data is processed.