Want Your Company to Get Better at Experimentation?

Here is an excerpt from an article written by and for Harvard Business Review. To read the complete article, check out others, sign up for email alerts, and obtain subscription information, please click here.

Illustration Credit:  Jamie Chung/Trunk Archive

* * *

For years online experimentation has fueled the innovations of leading tech companies such as Amazon, Alphabet, Meta, Microsoft, and Netflix, enabling them to rapidly test and refine new ideas, optimize product features, personalize user experiences, and maintain a competitive edge. Owing to the widespread availability and lower cost of experimentation tools today, most organizations—even those outside the technology sector—conduct online experiments.

However, many companies use online experimentation for just a handful of carefully selected projects. That’s because their data scientists are the only ones who can design, run, and analyze tests. It’s impossible to scale up that approach, and scaling matters. Research from Microsoft (replicated at other companies) reveals that teams and companies that run lots of tests outperform those that conduct just a few, for two reasons: Because most ideas have no positive impact, and it’s hard to predict which will succeed, companies must run lots of tests. And as the growth of AI—particularly generative AI—makes it cheaper and easier to create numerous digital product experiences, they must vastly increase the number of experiments they conduct—to hundreds or even thousands—to stay competitive.

Scaling up experimentation entails moving away from a data-scientist-centric approach to one that empowers everyone on product, marketing, engineering, and operations teams—product managers, software engineers, designers, marketing managers, and search-engine-optimization specialists—to run experiments. But that presents a challenge. Drawing on our experience working for and consulting with leading organizations such as Airbnb, LinkedIn, Eppo, Netflix, and Optimizely, we provide a road map for using experimentation to increase a company’s competitive edge by (1) transitioning to a self-service model that enables the testing of hundreds or even thousands of ideas a year and (2) focusing on hypothesis-driven innovation by both learning from individual experiments and learning across experiments to drive strategic choices on the basis of customer feedback. These two steps in tandem can prepare organizations to succeed in the age of AI by innovating and learning faster than their competitors do. (The opinions expressed in this article are ours and do not represent those of the companies we have mentioned.)

The Current State

The basics of experimentation are straightforward. Running an A/B test involves three main steps: creating a challenger (or variant) that deviates from the status quo; defining a target population (the subset of customers targeted for the test); and selecting a metric (such as product engagement or conversion rate) that will be used to assess the outcome. Here’s an example: In late 2019, when one of us (Martin) led its experimentation platform team, Netflix tested whether adding a Top 10 row (the challenger) on its user interface to show members (the target population) the most popular films and TV shows in their country would improve the user experience as measured by viewing engagement on Netflix (the outcome metric). The experiment revealed that the change did indeed improve the user experience without impairing other important business outcomes, such as the number of customer service tickets or user-interface load times. So the Top 10 row was released to all users in early 2020. As this example illustrates, experimentation enables organizations to make data-driven decisions on the basis of observed customer behavior.

Barriers to Scaling Up Experimentation

Data science teams often lead the adoption of online experimentation. After initial success, organizations tend to fall into a rut, and the returns remain limited. A common pattern we see is this: The organization invests in a platform technically capable of designing, running, and analyzing experiments. Large technology companies build their own platforms in-house; others typically buy them from vendors. Although these tools are widely available, investing in them is costly. Building a platform can take more than a year and usually requires a team of five to 10 engineers. External platforms generally cost less and are faster to implement, but they still require dedicated resources to be integrated with the organization’s internal development processes and to gain approval from legal, finance, and cybersecurity departments.

After the initial investment, leaders who sponsored the platform (usually the heads of data science and product) face pressure to quickly demonstrate its value by scoring successes—experiments that yield statistically significant positive results in favor of the challenger. In an attempt to avoid negative results, they try to anticipate which ideas will have a big impact—something that is exceptionally difficult to predict. For example, in late 2012, when Airbnb launched its neighborhood travel guides (web pages listing things to do, best restaurants, and so on), the content was heavily viewed, but overall bookings declined. In contrast, when the company introduced a trivial modification—the ability to open an accommodation listing in a new browser tab rather than the existing one, which made it easier to compare multiple listings—bookings increased by 3% to 4%, making it one of the company’s most successful experiments.

Motivated to turn every experiment into a success, teams often overanalyze each one, with data scientists spending more than 10 hours per experiment. The results are disseminated in memos and discussed in product-development meetings, consuming many hours of employee time. Although the memos are broadly available in principle, the findings they contain are never synthesized to identify patterns and generalizable lessons; nor are they archived in a standardized fashion. As a result, it’s not uncommon for different teams (or even the same team after its members have turned over) to repeatedly test an unsuccessful idea.

Looking to increase the adoption of and returns from experimentation, data science and product leaders tend to focus on incremental changes: increasing the size of product teams so as to run more experiments and more easily prioritize which ideas to test; hiring additional data scientists to analyze the increased number of tests and reduce the time needed to execute on them; and instigating more knowledge-sharing meetings for the dissemination of results. In our experience, however, those tactics are unsuccessful. Managers struggle to identify which tests will lead to a meaningful impact; hiring more data scientists provides only a marginal increase in experimentation capacity; and knowledge-sharing meetings don’t create institutional knowledge. These tactics may appear sensible, but they end up limiting the adoption of experimentation because the processes they establish don’t scale up.

* * *

Here is a direct link to the complete article.

Iavor Bojinov is an assistant professor of business administration and the Richard Hodgson Fellow at Harvard Business School. He is also a faculty affiliate of Harvard’s statistics department and the Harvard Data Science Initiative.
David Holtz is an assistant professor in the management of organizations and entrepreneurship and innovation groups at the University of California, Berkeley’s Haas School of Business and a research affiliate at the MIT Initiative on the Digital Economy.
Ramesh Johari is a professor of management science and engineering at Stanford University and an associate director at Stanford Data Science.
Sven Schmit is the head of statistics engineering at Eppo, an experimentation platform vendor.
Martin Tingley is the head of the analysis team on the Netflix experimentation platform.
Posted in

Leave a Comment





This site uses Akismet to reduce spam. Learn how your comment data is processed.