This is a thoughtful, short and concise book about an in-demand topic, written by leading academics in the field of behavioral economics and experimentation. Luca and Bazerman provide examples to illustrate their points from various angles without making the book simply a collection of bullet points. They dive into the historical roots of behavioral science and experiments, focusing on the fields of Psychology and Economics. The authors cleverly describe the technicalities of experiments and behavioral insights with easy distinctions between similar and often confusing concepts, precluding the need for more technical detail. I particularly enjoyed the structured and organized dive into tech industry experiments. Each chapter of Part II describes one such experiment / tech company, accompanied by a lesson that illustrates the “take-away” point. Occasionally, I found their background stories to be somewhat myopically focused on researchers and organizations affiliated with their institution – Harvard University. As someone who is very familiar with the literatures they touch upon, I would expect to see a slightly broader perspective of how the contributions of other institutions helped shape the state of behavioral experimentation. Overall however, I would highly recommend this book to anyone – not necessarily to people interested in experiments. It’s such an important, widespread and on the rise topic, presented in a very accessible and illustrative manner, that it’s probably worth anyone’s time. Below I summarize the key points and takeaways from each chapter.
Part I: Breaking Out of the Lab
Part I takes readers through a journey in the history of experiments, focusing on the fields of Psychology and Economics. It also serves to introduce some core ideas and terminology, as well as providing some motivating examples.
Chapter 1: The Power of Experiments
The chapter begins with a motivating example from one of the earliest success stories of behavioral experimentation, namely the Behavioral Insights Team (BIT). In an effort to increase the compliance rate of taxpayers, the UK government would send out generic letters reminding people to pay their taxes. The BIT thought they could improve the compliance rate by changing the wording of the letter. “You might be wondering, how much could the wording of the letter matter? The genius of Halpern’s idea wasn’t the decision to rewrite the letter, but rather the recognition that they didn’t need to wonder whether the wording matters – they could just find out [by experimenting].” The BIT defined a control group (the status-quo letter) and four treatment groups (variations of the letter), which varied in their appeal to people’s reference groups (for example, country-level) and their framing in either the gains or losses domain. The results were striking – One of the variations could increase compliance and thus tax revenue by 11.3 million pounds.
The rest of chapter takes readers through a journey of the history of experimental thinking – from the book of Daniel (yes, the Daniel from the Bible), to 1,500 years later with Ambroise Pare – a “barber surgeon” in the French kingdom, to 1747 with the navy surgeon James Lind experimenting with treatments for scurvy amongst sailors, to 1882 when biologist Louis Pasteur who launched the concept of controlled trials, to finally, the British statistician Ronald Fisher who invented and proposed tests for the significance of treatment effects.
Chapter 2: The Rise of Experiments in Psychology and Economics
I found this chapter to be an exceptionally well-written modern history of behavioral economics. In less than 30 pages, the authors take the reader through a chronological journey that covers the groundbreaking experiments in the fields of Psychology and Economics that have stood the test of time and how each experiment changed the course of the literature leading up to today.
A Brief History of Experimental Psychology: The discussion begins with the first psychology labs – Wilhelm Wundt opened in 1879 at the University of Leibzig, while Stanley Hall at Johns Hopkins University opened the first in America. The authors then turn to Ivan Pavlov (of the Pavlovian conditioning) and consequently to John Watson’s theory of behaviorism which called for psychology to become “a purely objective experimental branch of natural science”. Social psychology rises to prominence with two infamous experiments motivated by explaining the atrocities committed by the Nazis and most importantly obedience to authority: (1) Milgram’s experiments of electric shocks (faked) from one participant to another (a confederate) and (2) The Stanford Prison experiment by Philip Zimbardo (which was later made into a movie). With that, the authors talk about the frequent use of deception in psychology experiments and the rise of Institutional Review Boards (IRB) to ensure that experimental participants’ well-being is ensured.
Experiments in Economics: When we talk about experiments in economics we can generally be referring to one of two types: (1) Massive field experiments or Randomized Control Trials (RCTs) or (2) Behavioral lab experiments. While findings from both types of experiments are utilized in most areas of economics, RCTs are generally used in development, health, industrial organization, and marketing, while lab experiments have sprung their own field, namely the field of behavioral and experimental economics. Famous examples of RCTs include the New Jersey Income Maintenance Experiment and Newhouse’s Health Insurance Experiment, while notable organizations running RCTs include Mathematica Policy Research, the RAND Corporation and J-PAL.
Behavioral and Experimental Economics: The early lab economic experiments began with the purpose of testing the equilibrium predictions of economic theories such as price convergence, auctions, and game theoretic models. 2002 Nobel laureate Vernon Smith is credited the most for bringing lab experiments to mainstream economics as well as creating and documenting the rules of experimental economics. Some of the rules of economic experiments which differ markedly from psychology are the use of monetary incentives that vary according to the subject’s performance on the task and the avoidance of deception. From Smith, the literature turned less market-rationality oriented and more focused on individual behavior with people like George Loewenstein, Colin Camerer and Al Roth (among others). At the pinnacle of this new wave of behavioral and experimental economics sits the collaboration between Daniel Kahneman and Amos Tversky whose two works revolutionized behavioral economics and economics in general with two key contributions: (1) Cognitive heuristics: Shortcuts or rules of thumb that people use to make decisions quickly and efficiently (instead of engaging in rational thinking); (2) Prospect theory: A theory of decision-making under risk as an alternative to the traditional expected-utility theory. The theory incorporates (among others) the importance of (a) one’s current wealth as a reference point and (b) the concept of loss aversion – that losses loom larger than equivalent gains.
Chapter 3: The Rise of Behavioral Experiments in Policymaking
The chapter begins with a famous study that I recall learning about in my very first class of behavioral economics in the second year of undergrad, which demonstrates the power of default options with the opt-in and opt-out mechanisms. The study examined the organ donation rates in 11 European countries and found that the four countries with opt-in systems (the default was to not be an organ donor) had donation rates ranging between 4% and 28%, while the corresponding rates of the remaining seven countries with opt-out systems ranged between 86% and 100%. The authors then discuss the pushback and criticism of opt-out systems as a potentially unethical means of inducing compliance. As such, they discuss the concept of “Active choice” – having no default and the user being forced to make a decision, as a method that was considered both ethical and potentially inducing higher compliance than opt-in systems. However, besides the common belief that active choice would outperform opt-in, experiments have shown the contrary… That’s why it’s important to always experimentally test ideas!
The chapter then turns to Richard Thaler’s concept of Nudging – and a related concept of Choice architecture (the careful construction of the inclusion and presentation of choices to the decision maker). “Whereas many behavioral economists and psychologists had been focusing on helping individuals de-bias themselves …. the core idea of nudging focuses on changing the choice environment.” The authors also discuss the model of dual-systems – i.e. that we make most decisions in life almost automatically, effortlessly and implicitly (Type 1 – The intuitive system). However, occasionally we slow down and consciously think through a decision carefully and with reason (Type 2 – The Deliberate system).
Finally, the authors discuss the qualitative rather than quantitative importance of these insights and their contexts. Something that may work in one setting (whether that is a lab or a real-world setting) will not necessarily work in another setting — especially on the quantitative level. For those reasons, context-specific experimentation (i.e. experimenting for our particular purpose at our particular setting) is critical.
Part II: Experiments in the Tech Sector
Part II of the book takes the reader through a series of notable experiments in the tech sector. The authors dive home the following main ideas: (1) Experimentation complements intuition, (2) It’s often better to think about a series of experiments, rather than draw conclusions from standalone experiments, (3) It is critical to not lose track of a company’s long-term goals by focusing on short-term performance metrics that are easily captured in experiments.
Chapter 4: From the Behavioral Insights Team to Booking.com
“Looking at the broader landscape, however, the experimental revolution is still in its infancy.”
In this introductory chapter, the authors talk about some traditional barriers to experimentation and how some of the features of tech sector have allowed it to become one of the earliest adopters of experiments.
- Barrier 1: Not enough participants – “Even a seemingly large difference between two randomly assigned groups can be chalked up to noise if the experiment’s sample is too small.”
- Barrier 2: Randomization can be hard to implement – Think about spillover effects, ethical considerations and just situations where random allocation is not possible.
- Barrier 3: Experiments require data to measure their impact – Even if the above problems are met, it’s often hard to measure the outcome variables of interest (i.e. the customer loyalty, satisfaction, time spent reading an offline article, etc).
- Barrier 4: Underappreciation of decision makers’ unpredictability – “.. one of the great contributions of psychology, has been an enhanced understanding of how fragile, context-specific, and sensitive to framing decision making can be.”
- Barrier 5: Overconfidence in our ability to guess the effect of an intervention – A recurring theme, that so often, even the intuition of the most experienced managers can be wrong.
The authors dive into the cases of Google and Booking.com, two tech firms, the nature of whom all but removed the first three barriers and whose executives and managers invested a lot in creating a culture of experimentation in order to eliminate barriers 4 and 5. As such, they have been two of the most successful experimentation organizations. Finally, the authors mention several experimental examples like (1) the menu effects of prices ending in 9 (i.e. $29), (2) the impact of the word “sale” on actual sales, (3) the physical size of ads on the screen, (4) moving credit card offers from the home page to the shopping cart page, and (5) the timing at which fees are presented to customers.
Chapter 5: #AirbnbWhileBlack
A powerful application of the internet is that anonymity can help reduce racial bias. For example, on sites like Expedia, property managers simply list room availability and anyone can book with a credit card. As such, property owners cannot discriminate potential renters on the basis of their race. Airbnb’s entry into the market changed everything – the profile shows guests names, picture and other identifying characteristics to hosts, who can then decide whether to accept or (potentially discriminatory) reject the guest. Luca’s experiment proceeded as follows: his team created fake guest profiles with either white or black sounding names and randomly sent out inquiries to hosts. The result? Profiles with black-sounding names were 16% less likely to get a yes from the host. Could this result be statistical discrimination (hosts rejected African Americans because, in their experience, they were bad guests)? Turns out that most of the discrimination observed was among hosts who had never hosted a black guest before (and thus had no previous experience to drawn from). Upon publication of this study, all hell broke loose on Airbnb. They vowed to take the matter seriously and through a series of experimentation on website design tweaks, now claim to have significantly reduce racial discrimination. Some of the implemented changes are incentivizing hosts to accepting instant bookings, making it harder to see the guest’s picture, etc. Airbnb is now a fairer marketplace… thanks to experiments!
Chapter 6: eBay’s $50 Million Advertising Mistake
This chapter discusses an experiment which revealed that eBay’s $50 million per year advertising on Google was bearing minimal fruits, and how this may not be the case for other companies.
“In 2017 alone, the company [Google] made roughly $100 billion in advertising revenues.”
In a bid to attract customers, eBay was buying ad-space for Google search terms which included the word “eBay”. The metrics showed tremendous success: Lots of users searching for “eBay” would click on the ad link and then become potential customers. This sounded like a successful advertising strategy to eBay. However, economist (Steve Tadelis) however, pointed out that potential self-selection (users who googled “eBay” would likely have clicked on the eBay organic link – thus heading to eBay regardless) could have been undermining this supposed success story. Tadelis and the eBay team experimented by turning the Google ad on and off across markets. Indeed, while naturally eBay would lose all traffic from ad links when ads were turned off, the company saw a spike in organic traffic. i.e. Users who googled “eBay” would have gone to eBay regardless of the presence of the ad.. and eBay learned a costly lesson.. correlation is not always causation!
Chapter 7: Deep Discounts at Alibaba
This short chapter exposes an oftentimes overlooked yet critical aspect of rapid tech experiments: the long-term impact of policy changes. While companies are often mostly interested in customer satisfaction, customer retention and more broadly the long-term effects of changes, short-term Key Performance Indices (KPI) are easier to measure (i.e. clicks and purchases within a certain period of time).
Alibaba wanted to measure the impact of offering discounts for products that customers had left in their carts for more than 24 hours. While the (well-designed) experiment showed that users who received discounts were more likely to purchase the discounted products, those customers were not spending more money on the platform overall in the long-term. In fact, following a more thorough examination of user behavior, the researchers at Alibaba discovered that users who had received the discounts added and left more products to their shopping carts! This suggests that users figured out the discount scheme and attempted to manipulate it. The company abandoned the program.
While the authors applaud the company for a successful experiment, they argue that, instead of abandoning this discount program, the company should have run more experiments to better understand what works!
Chapter 8: Shrouded Fees at StubHub
“The value of an experiment is limited by the outcomes you are able to measure”
This chapter takes a closer look at another example which demonstrates the importance of tracking long-term metrics. StubHub (an eBay company), an online market platform which matches buyers and sellers of second-hand tickets makes money through service fees charged to the buyers of tickets. The company was contemplating a change in the way in which it presented service fees – i.e. from an “upfront fee” model (showing the final price of a ticket – i.e. including all fees, from the very first moment a consumer sees a ticket), which was currently operating, to a “backend fee” model (show the fees and thus final price after the consumer has selected a ticket – usually at the checkout page). A simple experiment – some users were randomly selected to keep receiving upfront fees (control group) while others were shown tickets with backend fees (treatment group), showed that, as expected, backend fees triggered more sales and higher price paid. All short-term metrics were in support of the backend fees model. However, the team later found out that users who were shown backend fees were less likely to visit StubHub in the future. Not a good sign for the company’s long-term strategy. While the company decided that it was worth implementing the new (backend fees) model, the authors outline several outcome metrics that the experiment did not measure and, once again, call for more experiments (a series of) to better understand the impact: (1) If the model gets more widely implemented and that has a negative impact on customer satisfaction, users may resort to alternative companies, (2) Spillover effects of reputation damage (of the backend fees) on the control group could bias results.
Chapter 9: Market-Level Experiments at Uber
Running randomized experiments is not always straightforward, especially when those experiments involve two-sided markets and several substitutable products. This chapter explores such a case when Uber experimented to test a new (at the time) concept of “Uber Express Pool” (riders wait a little longer and walk a little farther to get picked up and dropped off but save more money). To implement a simple randomized experiment was next to impossible due to: (1) Spillover effects – if some drivers / users can use this feature but others cannot, the whole market will be affected, (2) If the new feature increases matches and rides, it is likely then to negatively impact the availability of rides for the control group – again biasing the results, (3) Product substitution / cannibalization – the new product will likely impact both the supply and the demand for other Uber products like UberX.
To overcome the above, Uber fully implemented the new feature, but only in certain randomly chosen markets (i.e. cities – treatment) and not in others (control). Of course, using market-levels as control units brings about a new problem: Small sample which in turn is likely to make control units not identical, on average, to treatment units. To overcome this problem, Uber employed a statistical technique that allows to linearly combine several control units to create a control that most closely resembles the treatment units (Synthetic control method – see my “causal inference & program evaluation” tab under “online learning” – “resources” for more information). As most of us know, the new feature was deemed successful and is now at full implementation. Other features that Uber has experimented with are pricing tweaks, the impact of allowing tipping, giving small coupons after a bad experience.
Chapter 10: The Facebook Blues
“Secrecy is a bigger danger than publicity”
This fascinating chapter discusses a Facebook experiment which received massive backlash as being unethical, and explores the ways in which companies can be more transparent with consumers about their experimentation processes. Facebook has to constantly decide which posts to show on your news feed and for how long (out of a myriad of potential choices). Facebook designed an experiment to examine the impact of happy versus sad posts on user’s subsequent mood, ostensibly due tp curiosity. Some users were randomly selected to receive more negative posts (selected using content analysis), while others viewed positive posts. The results were small and uninteresting relative to the massive massive backlash from the community when Facebook decided to publish this study. Media and social groups condemned Facebook’s “unethical” and “manipulative” behavior towards their “trusted users”. The authors break down the criticism into three parts and respond to each:
- Criticism 1: “Influencing user emotions is impermissible under any circumstances” – In fact, advertisers and other groups constantly manipulate consumers’ emotions, oftentimes with false information.
- Criticism 2: “By experimenting with posts, Facebook manipulated users” – If that is true, then any decision that Facebook makes on the news feed algorithm should be considered a manipulation. Facebook has to show something – otherwise it shows nothing. The fact that Facebook tried two different algorithms concurrently, should make no difference.
- Criticism 3: “Facebook should get users consent before running any experiment on them” – In fact, by using Facebook you are agreeing to their terms and conditions which include using your data to improve their products and services.
Facebook seemed to take this lesson to heart and has since minimized publishing anything about their experimentation. However, the authors disagree with this approach and instead call instead for even more transparency as a means to educate consumers on the value that these experiments have on the products and services that they consume: “These experiments are valuable not only to the companies but also to users, who presumably don’t want low-quality services as a result of gut decisions that could easily have been improved through data. This means that users should be open to experimentation, and companies should stop shrouding the process in secrecy and mishandling communication surrounding their experimentation”.
Part III: Experimenting for the Social Good
Part III of the book takes the readers back to RCTs that have been employed to nudge people in making better decisions. Through examples, the authors point out three reasons why experiments have become a central part of so many organizations: (1) Experiments help to prove the value of behavioral interventions to skeptical stakeholders, (2) Experiments help identify which behavioral insights generalize to the settings of interest, (3) Oftentimes, the academic literature contained insufficient guidance for the problems at hand.
Chapter 11: Behavioral Experiments for the Social Good
This chapter explores extensively the career path and work of one of the authors’ students, Todd Rogers, who used experimentation in order to apply behavioral insights in the sphere of political economy. The authors discuss three interventions designed to increase voter turnout in elections:
(1) Encouraging prospective voters to vote: (a) via postcard, (b) via phone, (c) via in-person visit. In-person canvassing increased turnout by 9.8%, mail by 0.6%, and phone no difference compared to control.
(2) Utilizing the concept of “implementation intentions” (i.e. by nudging people to make concrete plans, you can increase the chance of goal completion): Prospective voters were called and asked questions about their specific plans on voting day (i.e. “Around what time do you expect you will head to the polls on Tuesday?”)
(3) Applying comparative social norms by calling prospective voters to offer information like “Turnout is going to be high today”.
Chapter 12: Healthy, Wealthy, and Wise
“Overall, this work highlights an important use of experiments: they can fine-tune general frameworks.”
Experiments not only help us discover what works, but also what does not work. Given that interventions can be costly, knowing what does not work can save companies and organizations a lot of money and wasted effort. In addition, past research and work can point out behavioral interventions that can have an effect. However, experiments are needed to verify whether an intervention will have an effect in the intended setting as well as the magnitude of the effect, which is even more context-dependent. This chapter discusses a number of RCTs broken down in three fields:
- [Wise] – Experiments in the field of education
- Question: How to reduce “summer melt” (college-bound students dropping out in the summer before their freshman year)?
- Treatment: A reminder text or email to prospective students to check-in with their counselor or advisor.
- Result: An average of 3% increase in the in the likelihood of a student enrolling to college. Effect much larger (8%-12%) for low-income students.
- Question: How to reduce the rates of chronic student absenteeism from schools?
- Treatment: Group 1 – Mails to parents reminding the importance of attendance and parents ability to influence attendance. Group 2 – in addition to the mails, information about student’s absences in comparison to other students.
- Result: Total absences reduced by 6% and chronic absenteeism by 10% for both treatment groups. i.e. Comparative information did not have any additional effects on top of reminder mails.
- Question: How to reduce “summer melt” (college-bound students dropping out in the summer before their freshman year)?
- [Wealthy] – Experiments in the field of personal finances
- Question: How to get people to sign up for retirement savings plans?
- Treatment: Offered payment to attend an information fair about such plans.
- Result: 67% more likely to enroll in a retirement plan + strong information spillovers (i.e. people spread out the information to their peers)
- Question: (same as above) – How to get people to sign up for retirement savings plans?
- Treatment: Changed the status quo / default from opt-in to opt-out.
- Result: 28% increase in employee participation, even after 4 years.
- Question: How to get people to sign up for retirement savings plans?
- [Healthy] – Experiments in the field of health
- Question: How to nudge Americans to take their prescribed meds?
- Treatment: Enrolling patients in a special lottery with a small chance to win a large reward. However, if a patient who won the reward, had not take their meds in time, they would not receive the reward (but still learn that they were selected).
- Result: Significant increase in medicine take-up.
- Question: How to nudge Americans to take their prescribed meds?
Chapter 13: The Behavior Change for Good Project
This chapter discusses the Behavior Change for Good (BCFG) initiative. Led by Katherine Milkman and Angela Duckworth, BCFG is an organization that unites social scientists and practitioners of behavioral science in a joint attempt to create positive and importantly durable behavior changes in the realms of health, education and savings. The motivation is that “our best intentions can easily be derailed by short-run temptations”. While the initiative is still at its infant stages their ambitious goals include:
- Health: Boosting gym attendance, reducing smoking, improving medication adherence, get people walking, and encourage healthy supermarket purchases.
- Education: Increasing class attendance, increasing homework completion, minimizing disciplinary incidents.
- Savings: Reducing spending, reducing cash withdrawals, increase savings.
Chapter 14: The Ethics of Experimentation
In this brief chapter, the authors talk about some pushback against experiments in the form of ethics – specifically, the claim that experiments treat groups people unequally. They retort that experimentation is nothing more but a form of learning, and that, when considering whether an experiment is unethical, they believe that we should weigh the long-term benefits of learning from experimentation over potential short-term opportunity costs of unequal treatment. Nonetheless, they do add on that not all experiments are good and that oftentimes business experiments may not aligned with the social good. The following quote encapsulates their sentiments on the subject:
“When we try out new ideas, why not do it in a manner that allows us to determine whether the change is actually effective? Failing to experiment wastes organizational resources and keeps us from learning from the strategies we try. It also raises ethical issues about ignoring evidence that would be easy to obtain.”
Chapter 15: A Final Case for Experiments and Some Concluding Lessons
“Experiments are simply a way to systematically and objectively try new ideas. [..] Thus, we believe that informed citizens should become fans of experimentation”
The authors begin this chapter by arguing that whether organizations want to run experiments or not, they do run them all the time – they call it “incidental experiments”. Whether this is a university randomly allocating students to dorm rooms and roommates, or the random order of words being presented to contestants at the National Spelling Bee, or the Pakistani government’s use of lotteries to allocate limited numbers of visas for citizens seeking to perform the Hajj pilgrimage to Mecca, or the Chicago Public Schools use of lottery system to decide which students will be admitted to different schools, experiments are being conducted constantly and everywhere, regardless whether they are intended to be experiments whose outcomes are to be studied for learning.
In terms of leaders of organizations, the authors say that they “need to have the humility and confidence to know what they don’t know, and to use experiments as part of their toolkit for answering tough questions” and call for leaders to “set the tone”, while “those who are barriers to experimentation are also barriers to finding more effective ways for their organization to operate.”
Finally, aside from evaluating a product or policy, experiments can be used to test a theory or hypothesis (“it is often helpful to understand why or how the policy is having an effect”), to develop or refine frameworks, or for fact-finding where no theory yet exists.