(This is a business case study. It will be used to guide discussions during the session: “Decision Support Tools” at the Vendo Partner Conference in Barcelona on September 17th.)
I’m going to make an observation. There’s a big gap between what we think we should do and what we actually do when it comes to taking decisions.
We think we should do a/b split testing because we know it’s the only way to eliminate the noise. For example, if we are trying to figure out which landing page converts best we think we should send the traffic randomly, 50/50, between two different pages to figure out which one converts best. We know we should. But often we don’t. We turn something on for a week, turn it off and guess that one is better than the other. Why don’t we do what we think we should do?
Because it’s hard. You need to have a clear idea of statistical significance, or, whether the results were caused by the change you made or by something else, some other factors that you weren’t changing on purpose like traffic mix, holidays, etc. You need to be able to do an analysis of variance. Otherwise the results will not be clear and you won’t know if they will happen again the same way.
Let’s say that we are able to use an inhouse or 3rd party optimization tool to a/b test for conversion and we have a statistically significant results. Ok. That’s good. But how about things that emerge over time like lifetime value? Suddenly it gets much harder to get good results. There are many variables and the results update daily, revising what we thought we knew. And let’s say we want to do this for different profiles, people who are different from each other.
What we’ve been working with at Vendo is a tool called the multi-armed bandit, a tool made famous for web applications by Steven Scott (not Scott Stephen) at Google. The name comes from another name for a slot machine like you find at a casino. They are called one-armed bandits because they have one arm and they steal your money. The question that led to the creation of the multi-armed bandit was this: what if I have a whole big bag of money and I want to know which slot machines to put my money in to win the most money? The tool is designed to take into account the results as you are putting money into the machines and tell you what to do next, which slot machine to put more money into.
In our world of the web, it constantly adjusts how much traffic goes to each alternative based on previous results. It solves four problems for us by getting to better results much faster than A/B split testing. Before we begin, let me say that there are certain cases where A/B is good and we can talk about those later (things are unlikely to change, you aren’t testing too many things, etc.).
- With an A/B split test you have to feed traffic to losers for a long time to be sure that it’s indeed true that you have found a winner. So, you are making less money as you are testing. A/B split testing can be twice as costly as the Bandit.
- You wait a long time to learn something. That means waiting to start another test to learn something new again. The Bandit can cut this time down by half. More rapid cycles = more learning.
- You need lots of traffic for each test so you cannot test a micro level where the real differences and value is. The tests have to contain a relatively large amount of people. That’s why many of us are still stuck at the country level when none of us would even say that someone who lives at Park Avenue and the southern end of Central Park (where Trump tower and the Plaza hotel are located) and makes over $100,000 a year has much in common with someone on Park avenue at the north end near Harlem who makes $25,000 a year.
- You need to act on the results of an A/B split test manually. The Bandit acts for you by constantly adjusting the amount of traffic it sends to alternatives each day. It’s called reinforcement learning and it is a form of artificial intelligence.
Questions for discussion: How do I know when a deal is working (and when it’s not)? How do I know which landing pages are working best? How do I know what I know is real, that I can count on it? I don’t want to test forever…that means keeping a loser alive. How can I kill losers and shift to winners more quickly? What tools help?