4 Reasons Charles Darwin Thinks A/B Testing Sucks

6 min readMay 16, 2019

People say that what’s old eventually becomes new again. If you pay close attention to modern UX design practices and also remember your high school biology, you’ll find a great example.

But it’s not beards I’m talking about, even though Darwin would get much respect from some members of the design community.

It’s evolution and A/B testing.

Darwin is famous for his “Origin of Species” where he raised the idea of “survival of the fittest”.

It’s the basic idea behind today’s A/B testing methodology:

Create your UI or form design (or take an existing one)
Create variants of the design (usually two); focus on areas you are attempting to improve
Test the variants in the market
Analyze the data, pick the fittest and repeat for as long as you have patience or budget.

It’s considered a gold standard, even if it is laborious. How else to get away from subjective judgment and personal bias?

But have you ever considered the shortfalls of A/B testing and how they might be denying you from creating an optimal user experience?

At ForMotiv, we did some brainstorming on this during the design of our behavioral analysis engine.

We came up with four key areas where A/B testing is failing the UX design community:

1. A/B Testing is laborious (aka EXPENSIVE)

So, you have been through hours of meetings with your team and the original concept for your UI (or form) has been thoroughly chewed up.

The version you are about to test probably doesn’t even look like the original concept.

In order to validate the decisions from the many hours of initial discussions, how will you execute the test plan?

Which of the 100+ possible variables (on a simple UI) are you going to run through your A/B trial?
How long before the budget runs out?

Superficially, the process of A/B testing is good — it even looks a lot like Darwin’s “Tree of Life” diagrams:

But if you are constrained by time and budget you will have to radically filter down what you’re testing.

As soon as you start to make subjective choices about what to test, you quickly lose the benefits of a controlled evolutionary process.

Remember, it’s “Survival of the Fittest”, not “Survival of your Favorites”.

2. A/B Tests are a poor tool to measure uncertainty

By definition, A/B tests force a binary choice. One design is better than the other. But A/B testing results are deceptive in their simplicity.

Their clarity is an illusion; bold decisions made based on a binary result may ignore important aspects of human decision making such as:

How close was the outcome? 51% vs 49%? Maybe not so binary?
Were “no result” / test abandonment outcomes factored into the decision — an unreported option “C”?
Was the length of time taken to make the choice measured and factored in?
How was the uncertainty of decision making taken into account? Revisited choices, re-entry / editing of data?

When you encounter vague situations, you could re-architect your A/B tests to provide some insight, but this adds time and cost.

Looking back at the previous point, do you really have the budget to do this?

And even if you do, will the tests capture the nuances that matter?

There’s a reason that evolution took millions of years.

True tests involve a life-cycle for each iteration and take time to execute. And evolutionary tests are broad, not subjectively chosen and narrow.

This is not a great fit in the corporate world.

3. A/B testing is focused on narrow paths

Building on the last point, carefully designed A/B test cycles will test small changes at each step.

After all, if you make many changes at each step, how do you know what was successful? This is another reason why evolution is necessarily slow.

The resulting time pressure leads us to grab for each small improvement that we witness.

But what if there are path dependencies?

Following the chain of small improvements only helps us if expected improvements are small and linear, drastically reducing the chances of making bigger improvements when significant insights are stumbled upon.

Taking a look at the scenario above, focusing on the overall outcome would say that testing more things concurrently is good (scenario on the right).

But this is laborious and expensive, so most testing is executed like the narrow path on the left.

In biological terms, the scenario on the left might have left earth with really great bacteria.

Expanding a little might have left us with highly evolved fish.

But the broad natural evolution that life on earth experienced led to all kinds of complicated life-forms, including customers.

4. A/B Testing assumes an unchanging world

A related flaw to the last scenario is that A/B testing (and much UI design) assumes a static world.

How many times have you heard that a user interface needs to be completely redesigned?

In many cases, the original design was great, but the environment changed.

In evolutionary terms, these are extinction events. According to the world-famous Natural History Museum in London:

“The vast majority (over 95%) [of species] died out because they couldn’t compete successfully for food or other resources. Or they failed to adapt to changes in their local environment over tens or even hundreds of millions of years.”

http://www.nhm.ac.uk/nature-online/life/dinosaurs-other-extinct-creatures/mass-extinctions/index.html

If you assume an unchanging world and only perform A/B testing at the beginning of your product’s lifecycle, you are doomed to meet the same fate.

But returning again to point 1, it is prohibitively expensive to continuously test using this methodology.

But what if you could continuously monitor the performance of your user interface? You could keep ahead of the environmental changes — and may even get the bonus of uncovering vital competitive intelligence.

Monitor your users’ experience in a different way with ForMotiv

We won’t make the claim that ForMotiv can solve all of your problems; but for the many companies that have problems creating effective forms, Behavioral Intelligence, which is a simple way of saying predictive behavioral analytics is a much better solution than A/B testing. Using the four points that we raised, let’s take a look at how ForMotiv can help:

A/B Testing vs. ForMotiv

A/B Testing is laborious and expensive!

Unless you reduce the scope of testing (which introduces subjectivity) this is a problem. ForMotiv provides a rich set of data that quickly shows you how users are behaving as they interact with your whole form.

2. A/B Tests are a poor way to measure uncertainty

Simple yes or no tests imply there are clear decisions being made. Real-life is more complicated. ForMotiv gives you rapid and deep insights into the nuances of a user’s interaction with the form both for completed and partially completed forms. How long did they spend on a field? Did they edit the data or return to a field later? Quickly get actionable data on user uncertainty!

3. A/B Testing is focused on narrow paths

Focusing on narrow paths and small incremental changes is slow and can lead you down the wrong development trail. ForMotiv allows you to monitor multiple changes to a form easily, including forms that dynamically add fields as needed. The rich data set makes optimizing the form simple and verifiable.

4. A/B Testing assumes an unchanging world

A/B Testing is so laborious and expensive that it tends to be confined to a “once and done” effort for each major release; but the business environment is always changing. ForMotiv’s easy-to-use console allows analysts to regularly monitor the performance of forms in use. Issues can be quickly investigated before they grow into an extinction event.

But don’t just take our word for it. Check out this review that we received from Raymond Camden, a well-known Developer Evangelist, a while back to get a great description of what we do and how we do it. Note: Formatic is now ForMotiv, and we’ve come a long way since then… 🙂

https://www.raymondcamden.com/2015/04/07/form-analytics-with-formatic

Or better yet — ask for a demo. We’d be happy to help you evolve your thinking about how to conduct user interface testing, especially for forms. We’ll also introduce you to some of our more advanced features, but I’ll save that for another blog post!

Originally published at https://www.formotiv.com on May 16, 2019.

4 Reasons Charles Darwin Thinks A/B Testing Sucks

1. A/B Testing is laborious (aka EXPENSIVE)

2. A/B Tests are a poor tool to measure uncertainty

3. A/B testing is focused on narrow paths

4. A/B Testing assumes an unchanging world

Monitor your users’ experience in a different way with ForMotiv

Written by ForMotiv