The Best Data Scientists Know How to Tell Stories

Best Data

Here is an excerpt from an article written by Michael Li for Harvard Business Review and the HBR Blog Network. To read the complete article, check out the wealth of free resources, obtain subscription information, and receive HBR email alerts, please click here.

* * *

When hiring data scientists, people tend to focus primarily on technical qualifications. It’s hard to find candidates who have the right mix of computational and statistical skills. But what’s even harder is finding people who have those skills and are also good at communicating the story behind the data.

At The Data Incubator, we run a fellowship identifying the top 2% of STEM PhDs looking to work with our partner companies, which range from larger firms like Goldman Sachs or Genentech to smaller companies like Bettermentor Yelp. Here are three attributes our partners look for in data scientists, and specific questions they use to identify those attributes:

Ability to articulate the business value of their work. It’s important to look for people who strive to benchmark their work with metrics. Ask a prospective candidate about a project they did and whether or not it was successful. During an interview, see if the candidate is talking about data science metrics or business metrics. A data science metric is one that measures the quality of a model: what was my r-squared? What was the root mean squared error? What was the model’s accuracy? While those are important questions for data science, they do not necessarily get at whether a project was successful. Project success is more often defined in terms of business metrics: how much did I decrease customer attrition? How much did we increase marketing effectiveness?

Good data scientists might lead with a data science metric, but give a business metric when prompted. The best data scientists immediately speak in terms of business metrics because they understand that their work has to have value for the organization, not just be interesting to data scientists. Managers should be wary of prospective hires who don’t know the business impact of their work (“I have to look that up and get back to you”)—that suggests a fundamental disconnect between management’s priorities and those of the potential new hires.

The right level of technical detail. When interviewing data scientists, we are tempted to grill them on technical intricacies like asymptotic bias or the functioning of Hadoop’s distributed cache. It’s important to remember that your potential new hire’s ability to communicate the right level of detail and to effectively tell the story behind the data is not often probed in technical interviews—but this is a huge part of the job. Ask prospective candidates to talk about previous data science work. Do they jump into the gory technical details or do they stay at an appropriately high level so that you can understand the message without being overwhelmed by buzzwords?

For example, any analysis is only as good as its assumptions. But does the candidate drone on about L2-integrability or does she tell you that she needed to assume customer flow is assumed independent day-to-day? Using the right technique is also important in analysis. Does the candidate opine on the virtues of random forests, or can she concisely articulate the reason for choosing that model? Frequently, we encounter candidates who rely on dogma rather than science. Be wary of those who only know the ins and outs of a favorite model but cannot articulate why they chose to use it beyond the fact that it is the “industry standard.” When identifying good candidates, you need to find those who understand the technical underpinnings but who can also then translate them for non-analysts.

* * *

Here is a direct link to the complete article.

Dr. Michael Li is a data scientist who has worked at Google, Foursquare, and Andreessen Horowitz. He is the founder and executive director of The Data Incubator.

Posted in

Leave a Comment





This site uses Akismet to reduce spam. Learn how your comment data is processed.