Book review & summary: ''Experimentation Works: The Surprising Power of Business Experiments'' by Stefan H. Thomke

Book review & summary: ”Experimentation Works: The Surprising Power of Business Experiments” by Stefan H. Thomke

Purchase the book on Amazon here. Share this page:

This book is a follow-up to the author’s 2003 book “Experimentation Matters: Unlocking the Potential of New Technologies of Innovation“, where the author predicts the power of strategic business experiments. This book serves to solidify his ideas and present numerous examples of how companies have been embracing experimentation and the benefits that experimentation has had.
Overall I found this to be a great guideline – a must-read for any business/organization thinking about implementing experimentation. The author takes us through countless examples of companies and organizations who have embraced an experimentation culture. Throughout those examples, he exposes readers to the hurdles and the benefits of becoming an experimentation organization. His vast experience in this area shines throughout the book.
While the book touches upon the math and statistics of experiments, it’s not the main purpose. The book is structured around several companies and high-level executives who have transformed their organizations to embrace experimentation. As such it looks at the big picture of experimentation rather than its technical details. In what follows I will attempt to summarize each chapter by discussing the key points, quotes and examples from each chapter.

Introduction: The Business Experimentation Imperative

“All too often, we discover that ideas that are truly innovative go against our experience and assumptions, or the conventional wisdom.”

Example [Microsoft]: The author talks about an employee who proposed an idea that was dismissed for 6 months (it was not considered to be innovative enough) before a software developer decided to A/B test it. It turns out that the full-scale implementation of that idea increased yearly revenues by more than $100m.

The author introduces and advocates for the “everything is a test” mentality: “By combining the power of software and rigor of controlled experiments, companies can turn themselves into learning organizations.”

Example [Booking]: The author uses Booking.com as an example of a company where employees can launch an experiment on millions of users without permission from management.

The author warns against undisciplined experimentation (throwing many disconnected ideas against the wall and seeing what sticks), advocating for disciplined testing: A well-designed and implemented experiment which fails provides room for learning of what doesn’t work and why as well as insights into the design of the next experiment. With that he distinguishes between a “failure” (a well-designed experiment which does not lead to an increase in the key performance index) vs. a “mistake” (a badly designed experiment which leads to inconclusive results i.e. does not produce any useful information).

Chapter 1: Why experimentation works

Two quotes that summarize this chapter: “Oftentimes, too, managers rely on their intuition – but ideas that are truly innovative go against experience. In fact most ideas don’t work.”, “Experience is often context-dependent and a success may result in hubris. Just because something worked for another company in another market doesn’t mean that it works here”.

Example [Microsoft]: On average, 1/3 of the experiments run in Microsoft produce positive results, 1/3 produce null results and 1/3 even produce negative results (i.e. in the opposite direction than expected).
Example [Google]: The company says that their experts’ predictions are wrong 96.1% of the time. However their capability to experiment and test what works fast and effectively – thus implementing only the good ideas is what has been driving their growth.

Quote from Amazon CEO Jeff Bezos: “Failure and Invention are inseparable twins”.

The author discusses several dimensions that lead to successful experimentation: Fast feedback; Experimentation capacity; Ability to run concurrent experiments; Using a control group.

I particularly liked this quote from Joan Fisher Box (daughter of Sir Ronald Fisher):

“The whole art and practice of scientific experimentation is compromised of skillful interrogation of Nature. Observation has provided the scientist a picture of Nature in some aspect, which has all the imperfections of a voluntary statement. He wishes to check his interpretation of this statement by asking specific questions aimed at establishing causal relationships. His questions, in form of experimental operations, are necessarily particular, and he must rely on the consistency of Nature in making general deductions from her responses in a particular instance or in predicting the outcome to be anticipated from similar operations on other occasions.”

Chapter 2: What makes a good business experiment?

Several interesting quotes from Jeff Bezos: “Our success at Amazon is a function of how many experiments we do per year, per month, per week, per day.”; “[In baseball] When you swing, no matter how well you connect with the ball, the most runs you can get is four. In business, every once in a while, when you step up to the plate, you can score 1,000 runs. This long-tailed distribution of returns if why it’s important to be bold”; “[We aim to do] the unnatural thing of trying to disconfirm our beliefs”.

The chapter evolves around seven important questions to ask when designing a good experiment:

Does the experiment have a testable hypothesis? – A good hypothesis is one that (a) defines quantifiable metrics, (b) identifies potential cause and effect, (c) can be shown to be false and (d) has clear impact on business outcomes.
Have stakeholders made a commitment to abide by the results? – Discusses the importance of letting experimental results dictate strategy in an unbiased manner and warns against the dangers of cherry-picking data which supports previous intuition.
Is the experiment doable? – Some environments are so rapidly evolving that insights are only short lived. Also discusses ethical concerns and sample size limitations.
Can we ensure reliable results? – Talks about the practical implementation problems of field experiments, discusses the advantages of blind and double-blind tests to tackle the “Hawthorne effect“.
Do we understand cause and effect? – Correlations can be good sources of causal hypotheses but warns against interpreting too much from correlations. Discusses a situation where adding oranges in sailors’ diets was found to improve mortality rates. As such it was believed that sailors needed more acidity in their diets, and sailors were given more acidic foods. However, many sailors kept dying until it was discovered that it was the oranges’ Vitamin C that was missing from the sailors’ diets.
Have we gotten the most value out of the experiment? – Discusses two dimensions: (a) The importance of checking for ancillary effects like network effects or product cannibalization, (b) [In an overall positive program], identifying potential parts of the program that may have had a negative effect or checking for potential treatment effect heterogeneity.
Are experiments really driving our decisions? – If management does not take experimental evidence seriously, then experiments are just wasting company resources.

Chapter 3: How to experiment online

“Without experiments, many breakthroughs might not happen. And many bad ideas would be implemented, only to fail, wasting resources”.

Examples of companies conducting more than 10,000 online controlled experiments annually: Microsoft, Facebook, Booking, Amazon, Google – Their experiments engaging millions of users.
Other examples of experimentation companies: Walmart, State Farm Insurance, Nike, FedEx, New York Times, BBC – also run experiments regularly, albeit, at a smaller scale.

Test all testable decisions – also called “Full-stack experimentation” – From the implementation of a new feature, a change to user interface (i.e. a color change), a back-end change, switching to a different business model.
Appreciate the value of small changes – The online world is massive with millions of customers generating billions of dollars of revenue. As such, even tiny treatment effects can scale to millions of dollars in increased revenue.
Invest in a large-scale experimentation system – To make the most of experimentation you need to ensure fast and cheap feedback. It is therefore vital to have the proper infrastructure: instrumentation, data pipelines, analysts and data scientists.
Build trust in the system – The author says that “the best data scientists are skeptics”. Some ways to build trust are 1. running A/A tests (control against control), 2. providing employees with proper statistical and software education, 3. checking your data for outliers (i.e. software bots when checking for website queries, orders from large institutions when examining individual demand, etc), 4. shuffling samples from one experiment to another to avoid carry-over effects.
Keep it simple – Limit the number of changing variables in order to improve interpretability and sharpen causality arguments.

Chapter 4: Can your culture handle large-scale experimentation?

This chapter showcases the author’s vast experience with experimentation consulting at several organizations. He breaks down what makes up a successful experimentation organization into seven attributes:

Attribute 1: A learning mindset – Failed experiments are not only desirable but necessary for learning. They eliminate unfavorable options and re-focus the design of future experiments. For example a company can avoid a costly mistake by experimentally uncovering that a product does not appeal to consumers or that a web-interface discourages interaction. “An experiment is only truly a failure if it is poorly designed or executed and results in findings that are inconclusive”.

Attribute 2: Rewards are consistent with values and objectives – The author here discusses the importance of incentivizing employees to engage in experimentation that aligns with the company’s long-term goals.

Attribute 3: Humility trumps hubris – Minimize managers’ ego and pre-existing beliefs. The author cautions against several human biases that can compromise analysis: 1. Seeing relationships where there are none, 2. Confirmation bias, 3. Rejecting conclusive evidence. To illustrate the above points, the author discusses an example where experiments testing the effectiveness of acupuncture yielded better results in countries where acupuncture is more widely accepted, such as Asian countries.

Attribute 4: Experiments have integrity – The author emphasizes the importance of ethical experimentation but also says that experiments are oftentimes more scrutinized than they should be. He warns against the “A/B illusion: People tend to focus on the high-profile experiment in the foreground rather than the status quo in the background, regardless of how ineffective the current practice is”.

Attribute 5: The tools are trusted – Discusses the importance of manager oversight of and trust in experimentation systems and how this can be the biggest hurdle in an organization’s cultural shift.

Attribute 6: Exploration and exploitation are balanced – Discusses the need to balance exploration (experimentation to capture new value through new operations) and exploitation (maximizing revenue by standardizing operations).

Attribute 7: Ability to embrace a new leadership model – If most decisions are tested, then what is the role of managers? 1. They should set the grand challenge / vision / long-term goal that can be broken down to testable hypotheses; 2. They should put in place systems, resources and organizational designs and standards that allow large-scale, trustworthy experimentation; 3. They should be role-models for everyone else by having their own ideas subjected to tests and demanding experimental evidence in order to move forward.

Chapter 5: Inside an experimentation organization

This chapter dives into Booking.com and explores the cultural and operational features that make this company a great experimentation organization. The chapter begins with a quote by Isaac Asimov: “Experimentation is the least arrogant method of gaining knowledge”.

The author highlights Booking’s team-oriented culture which emphasizes autonomy and empowerment by democratizing experimentation at all levels of the organization. From its early days, Booking leadership believed that the best way to optimize their customers’ experience is through disciplined testing augmented with qualitative research. Some quotes from two of Booking’s high-ranked managers which summarize the company’s culture:

“We see evidence every day that people are terrible at guessing. Our predictions of how customers will behave are wrong nine out of ten times […] My team’s mission is to enable all of our employees to run experiments autonomously.” – Lukas Vermeer (Senior Product Owner of Experimentation)
“Our customers decide where to take the website, not our managers […] we ask everyone to experiment as much as they can. The only requirement is that all changes have to be tested. So you get the cumulative effect of many small changes that, over time, nobody can compete with anymore. ” – David Vismans (Chief Product Officer)

To achieve the above, Booking had to invest heavily in standardizing experimentation tools across the organization as well as constantly checking data for errors and red flags. To encourage openness, all experiments had to be well documented and were made available for everyone to review.

The author discusses two potential problems in Booking’s approach and how the organization is set up to avoid them.

The decentralization and bottom-up approach could lead to inefficient use of resources (i.e. by several teams tackling similar problems) and make it harder to trace the sources of problems (i.e. coding bugs and ethical considerations).
The focus on micro-experimentation could leave the company vulnerable when bigger and untestable decisions had to be made (i.e. whether to invest in a new company or product).

Chapter 6: Becoming an experimentation organization

“Success is the ability to go from failure to failure without losing enthusiasm” – Anonymous.

The journey starts with the system: “.. just instructing an organization to run thousands of experiments annually, for instance, won’t yield fast innovation and may even backfire”. With this, the author emphasizes the importance of an experimentation platform, either built in-house or through a third-party application (Optimizely, Google Optimize, Adobe Target), that is trustworthy, flexible and harmonious. Such a platform will drive down the cost of running and analyzing experiments. The author describes seven system levers / metrics that indicate the health of a company’s experimentation system: 1. Scale (number of experiments per week), 2. Scope (How involved in experiments are the employees), 3. Speed (Time from hypothesis formulation to experiment completion), 4. Shared values (Behavior and judgment that facilitate experiments), 5. Skills (Competencies needed to design, run and analyze experiments), 6. Standards (What are the quality check criteria that facilitate trust), 7. Support (How much technical help, training and managerial support is available.)

Changing your organization: The author says that, in his experience – “Running experiments in most organizations is like riding a Jet Ski in a swimming pool,” emphasizing that the current structure of many companies is not expansive enough to fully leverage experimental methods. He then lays out the ABCDE framework which most companies go through in transitioning to experimentation organizations: (A) Awareness (Understand that experimentation matters), (B) Belief (Adopt rigorous framework and tools), (C) Commitment (Allocate resources and change organization), (D) Diffusion (Widen scope and access to tools), (E) Embeddedness (Democratize experimentation).

Example [IBM]: The company ran 97 tests in 2015 and evolved to running 1,317 tests by 2018. The author says “.. it’s hard to find many princes if you’re kissing only 97 frogs per year.” Then he describes the steps that IBM executives took in order for this change to take place, for example, implementing scalable, easy-to-use testing tools, introducing a framework for disciplined experiments, and making online tests free for all business groups.

Tools in use: David Vismans – “If I had any advice for CEOs, it’s this: Large-scale testing is not a technical thing; it’s a cultural thing that you need to fully embrace.” In order for companies to unlock the potential of large-scale experimentation, they must find new ways to operate that embrace this new culture. Trust of the tools and behavior in using the tools must also change accordingly to reflect this cultural shift. Finally, interferences should be minimized by coordinating the efforts of various specialized groups. This is where a standardized system would help tremendously.

Chapter 7: Seven myths of business experimentation

Myth 1: “Experimentation-driven innovation will kill intuition and judgment.” – The two are complements rather than substitutes. In fact, testing is a fast and cheap way to test one’s intuition.

Myth 2: “Online experiments will lead to incremental innovation but not breakthrough performance changes” – In fact, many breakthrough performance changes are motivated by findings from online experiments.

Myth 3: “We don’t have enough hypotheses for large-scale experimentation.” – The more you experiment, the more hypotheses you generate from your experiments’ findings. The largest experimenters started by running only a few experiments per year and most companies still only run very few experiments.

Myth 4: “Brick-and-mortar companies don’t have enough transactions to run experiments.” – While sample size is definitely a valid concern, the author advises such companies to focus on running bigger and riskier experiments. (remember that the larger the treatment effect, the less sample size you need to find statistically significant differences).

Myth 5: “We tried A/B testing, but it had a modest impact on our business performance.” – Experimentation is a highly long-term strategy and many failures are to be expected. In addition, one should make sure to properly account for interaction effects (i.e. if two tests each increase performance by 1%, together they may increase performance by 3%).

Myth 6: “Understanding causality is no longer needed in the age of big data and business analytics. Why waste time on experiments?” – Correlation is not causation, regardless whether you have big data or not. “A superficial understanding of why things happen can be costly, or in the case of medicine, even dangerous”.

Myth 7: “Running experiments on customers without advance consent is always unethical.” – Informing users that they are participating in an experiment may alter behavior. The author restates the “A/B illusion” from chapter 4. Finally he says that “People seem unconcerned with the current practice of being emotionally manipulated through advertising and other means, although the harmful effects of these media may have never been rigorously tested.”

Epilogue: A brief look at the future

“The future is already here — It’s just not very evenly distributed.” – William Gibson.

One of the most important questions concerning companies is “What does your customer value?“. The author argues that most of the research methods that attempt to understand consumer behavior are indirect — i.e. they look at what customers think they want instead of how they actually behave in the required setting. This is where large-scale experimentation can bring in massive value, by enabling companies figure out what their customers value and how they behave — cheaply and with scientific rigor.

In looking into the future, the author contemplates the following:

“What if AI-based methods could analyze your data (customer support information, market research, and so on) and generate thousands of evidence-based hypotheses? Now imagine that these algorithms could also design, run, and analyze experiments with no management involvement at all.”

Get in touch

Please do get in touch if you have any questions or concerns. I’m glad to discuss new ideas and suggestions for research or other work. Shoot me an email, text me on skype or write your message in the box below. I’ll get back to you as soon as time permits.

Andreas Aristidou

Book review & summary: ”Experimentation Works: The Surprising Power of Business Experiments” by Stefan H. Thomke

Experimentation Works: The Surprising Power of Business Experiments

Get in touch

Andreas Aristidou

Copyright © 2020. All Rights Reserved.