Why numerical intuition is the most important data science skill I screen for

I recently went on the Minds in the Machines podcast to talk about data science careers. On the podcast, I talked about the importance of numerical intuition as a skill for aspiring data scientists. In fact, when I am interviewing data scientists for jobs, numerical intuition is the #1 skill I screen for.

What is numerical intuition?

By numerical intuition I basically mean:

An ability to look at a graph or a number, interpret it, and express a point of view about what it means for the business.

Underpinning this kind of numerical intuition, in a data science context, is an ability to take a messy data set and generate a meaningful plot, visualization, or statistic, such that you can apply your numerical intuition to that output and draw a meaningful conclusion. But that is a subject for a different post.

Looking at a line graph and saying what it means sounds easy (“It’s up and to the right…that’s good, right?”). But in practice, the mechanics of your underlying data can be quite subtle. Even when the mechanics are simple, you need to be thinking about what’s going on under the hood of a given statistic at all times. You may be the only person in the room thinking that way, and in doing so, you might save your company from making a multi-million dollar mistake.

A Pebble tale

Here’s a very simple example of the impact numerical intuition can have in practice.

At Pebble, we partnered with retailers (think big box stores) that stocked their shelves with our watches. Four weeks after we launched a new version of the watch, one of our retailers started complaining that our return rates were going up. The increasing return rate had so alarmed them that they were pushing to renegotiate their contract with us (to our disadvantage, obviously).

The management team asked me to look into the source of the return rate increase. Specifically, they wanted me to dig through the data looking for evidence of some bug that would be causing people to return the product. We hadn’t heard any increased chatter on our customer support channels, and we weren’t noticing any issues as we used the watches ourselves, but they theorized that there must be some problem of which buyers at this specific retailer were perhaps less tolerant than the rest of our user pool. I mean, what else could it be?

Well, it wasn’t a bug. The problem was fractions.

Return rates

Return rates are usually calculated on a weekly basis, like this:

  • Return rate = # returns this week / # sales this week

So four weeks after a product launches, what’s going on in this ratio?

First, the sales graph probably looks like this:

Artist’s rendering: Number of sales per week.

Sales explode in the first week, thanks to the enthusiasm that accompanies a new release, then decline gradually in the following weeks. So that means that the denominator for the return rate—the number of sales each week—is shrinking.

Meanwhile, there is always some lag associated with returns. People don’t walk out of a store with a product and immediately walk right back in to return it. A product purchased in Week 1 might be returned in Week 2, or Week 3, or Week 4, or even later.

Eventually, the number of returns will stabilize as the number of sales stabilize, but until that happens, your sales trend and your returns trend probably look like this:

Artist’s rendering: Number of sales, number of returns per week.

So without even looking at any specific data, I can tell you that return rate is always going to blow up a few weeks after a new product launches. The numerator (returns) is growing, and the denominator (sales) is shrinking—of course the ratio is going to increase.

Reaching this conclusion does not require SQL queries, or Python code, or machine learning, or even any complex statistics. All that is needed is for someone to think carefully about the underlying dynamics that are packaged up in this particular data point. That’s numerical intuition.

Honing your numerical intuition

I credit my numerical intuition to the years I spent studying physics, which involved a lot of thinking about how an equation changes if one quantity increases, or decreases, or is taken to infinity. But I think this is a skill anyone can learn, or at least get better at. Here are a few ways to perfect it:

  1. Spend time getting to know your data before you dive into statistical analysis. Make basic line graphs of the data over time. Make histograms and plots that show how different quantities are distributed. Make pie charts if you want! Do the basic stuff. Keep it simple. Feel free to use Excel. If you rolled your eyes when I said “Excel,” examine that impulse. You need to do basic analyses before you do advanced ones. Otherwise, you’ll miss crucial truths about the data you’re working with.
  2. Make a habit of drawing out back-of-the-envelope visualizations as you work on data-related problems. Before you make a visualization of your data, draw a picture of what you *think* it should look like. When you plot the real data, compare your output to your hypothesis. Is it different? Why?
  3. As you start to discover the important quantities or trends in your dataset, draw these out on paper. Draw what *would* happen if things were to change, both in mundane cases (e.g., how does the number of active users vary by day of week? how does the number of active users change when an ad we run is unusually effective?) and in extreme cases (e.g., what happens to the number of active users when, say, the site crashes following a spike in activity?). Be creative and think like a physicist: What happens when x increases? What happens when x decreases? What happens when x goes to infinity?
  4. Beware the data scientist’s impulse to look down on data analysts. (I have heard analytics work referred to as “grunt work” by way too many people who should know better.) Simple analytical work is the foundation of all data-related intuition. Don’t sell it short.
  5. Question everything. Question E V E R Y T H I N G. Fractions mask truth. So do certain analyses that aren’t properly split by cohort. Be skeptical about what the data tells you (and what other people tell you the data tells you) and check it and double check it and check it again. Learn how to turn your data inside out, to look at it from different angles so that you can confirm your hypotheses in multiple ways. Be relentless and uncompromising when it comes to finding the full truth.

After all this, then you, too, can proudly say that some of the most impressive moments of your career have involved calmly explaining fractions to a group of executives!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s