On Thinking in Experiments

A Minimum Viable Philosophy

Oct 18, 2023

Exploration is crucial for creating high-impact engineering teams. The more ideas we try, the better our chances for success. There’s just one problem: Engineering teams have limited time and resources.

Every feature we build takes time away from a feature we could have built. These limitations force us to make tough choices because we simply can’t do everything at once.

The challenge, then, is to minimize the investment required to try new ideas. This not only ensures that we try more ideas, but also that we know which ideas are worth expanding on.

This is the goal of thinking in experiments. It is a framework for maximizing bandwidth and encouraging exploration. Thinking in experiments means that we must:

Make our hypothesis clear
Begin with measurement in mind
Identify the best path to feedback and execute
Review and react

1. Make our hypothesis clear

“I think doing A will cause B”

A clear hypothesis is like a north star. It guides our priorities, helps us narrow down the metrics we care about, and aligns our team around a common goal.

When we want to "grow our audience," are we referring to an increase in organic traffic or are we talking about a higher conversion rate for our premium products?When we want to "create a world-class checkout process," are we aiming for world-class convenience or world-class security?

Clearly defining our expectations helps ensure we stay focused on the right things. Our goal in this stage is to remove as much ambiguity from our hypothesis as we can. In general, a great hypothesis will have:

Positive team/company impact: Before we go any further, does this hypothesis address a problem that we actually have? If not, it’s probably worth asking whether the experiment is worth doing right now.
Only one or two specific actions: If we have too many inputs in our experiment, we lose the ability to gauge the impact of each individual input.
Explicit outcome(s): This experiment could have many outcomes. Which of these possible outcomes do we care about most? Understanding this helps us determine how to measure success and how to prioritize.

As a rule of thumb: our hypothesis is probably clear enough if every team member can explain what we are doing and why we are doing it.

2. Begin with measurement in mind

“I think that, to accurately measure B, we will need to know X, Y, and Z”

Before we build anything, we need to understand how we will measure success. Identifying and understanding our key metrics will ensure that we are able to evaluate our progress from day one.

Good metric design

Choosing the best metrics for our needs is, unfortunately, more of an art than a science. There are many ways to measure our outcomes, but the metrics we choose should be:

Specific as possible: If we have access to excellent data already, we can get pretty specific. If not, then we should at least identify the broad areas we care about and the direction we like to see them move. This may even uncover opportunities to make our observability tools better.
Simple to understand: It should be easy to understand the results of our experiment and to communicate the results with stakeholders. We don’t want to dumb down the metrics to the point of inaccuracy, but reducing the complexity makes it easier for others to buy in.
Robust enough to be accurate: Metrics can be misleading. For example, in a large population, measuring an average will likely give unwanted weight to outliers and could skew the perception of reality. It’s important for us to collaborate with others to understand the blind spots of our own metrics.
Tailored to our hypothesis: Do our metrics actually prove our hypothesis? We should always compare our metrics against our hypothesis before the experiment begins. Measuring the wrong thing with 100% accuracy is no better than measuring the right thing with 0% accuracy.

Quality-Control Metrics

It’s important to remember that our experiment could have side effects. Every change we make has trade-offs.

To make sure that we understand these trade-offs, it can be helpful to create quality-control metrics. Quality-control metrics are designed to make sure we aren’t sabotaging something important in our pursuit of something new.

For example, we might be trying to increase the number of ad impressions on our website. This seems like a great goal, but what if the click-thru rate plummets as a result? We could actually lose money.

To mitigate this risk, we could couple our ad impression target with a quality-control target, such as keeping the click-thru rate above 4%.

3. Identify the best path to feedback and execute

“The smallest version of A that will impact X, Y, and Z is…”

Now that we have a hypothesis and a set of key metrics, we can use them to build a cohesive plan. Our goal is to find the shortest path that can validate our hypothesis.

In practice, this is easier said than done.

Cutting the scope of our experiment can be frustrating. After all, the feature is perfect in our imagination. Breaking it up just feels wrong.

But if we truly want our feature to be perfect, we need feedback. Therefore, we should strive to get feedback as quickly possible. To do that, we need to cut scope.

That said, it can be hard to know what to prioritize. When deciding what features to keep and which to postpone, we should:

Use our hypothesis: Our hypothesis should determine which features we prioritize. Reducing the scope of a project doesn’t do any good if we can’t validate our hypothesis.
Deliver value as soon as possible: It’s okay to start small. If we’re building a self-service profile editor, for example, we don’t need to let users edit every field to prove that the idea is worth exploring. Instead, we might start by only letting users edit their username. This lets us gather feedback while we add more fields in the following weeks.
Isolate the action: Let’s say we’re building a project with three potential features. If we build and launch all three features at once, we won’t be able to tell which feature had the biggest impact on our results. On the other hand, if we launch each feature separately, we can get a better idea of the relative impact of each.

4. Review and react

“Based on X, Y, and Z, we will…”

Once we’ve finished our experiment, it’s time to review the results and figure out where to go from here.

If our hypothesis was right, congratulations! Take some time to celebrate. We’re safe to double down on our hypothesis or to explore related ideas knowing we have the data to back it up.

If our hypothesis is wrong, it’s easy to feel discouraged. We might think that our hypothesis says something about our intelligence or our intuition. As a result, it can be difficult to let our experiment go.

But remember: Even if our hypothesis is wrong, it pushes us forward.

We’re further ahead than we were before the experiment because now we know it was wrong and we can move on. Our experiment is over and we get to try something else knowing that we gave our hypothesis a chance.

That’s the beauty of thinking in experiments. It gives us permission to try something new, but it also gives us permission to move on.