I know most of you already know the answer to this. You guys can skip down to the bolded part. Okay, say you made a new frontpage or something. Maybe it will make your website better! How do you know? If you're like me, you hate not knowing. There's a thing for knowing, and it's called an A/B test. Sometimes it's called a split test or an experiment, but it's usually called an A/B test. The way it works is it gives half your users the new behavior, gives half of them the old behavior, and tells you which behavior your customers like more.
An A/B test is a great thing, and you want to use it on pretty much every user-visible change. That means you're going to be using it a lot, so it has to be extremely easy to use. So we made our A/B framework extremely easy to use. Warning: this article is meant to be inspirational and informative about how easy A/B testing can be, but it doesn't have actual code you can use. That's because our code wouldn't work for you, because of how dependent on our setup it is.
We're running an A/B test on our own frontpage right now*. To show you how easy it is, here's the Python code that's running on our webservers, completely unedited:
if h.branch(["default", "front-scoutbox"]) == "default": return render('/front.mako') else: return render('/front-scoutbox.mako')
Let's take a look at the important parts.
if h.branch(["default", "front-scoutbox"]) == "default":
This call to h.branch is where the magic happens. You pass it the names of your competing features, and it says which feature to use this time. In this case, we're asking the A/B test whether the current user should get the default frontpage or the new "scoutbox" one. Shipping this code is all you need to start an experiment; no setup anywhere else.
if h.branch(["default", "front-scoutbox"]) == "default": return render('/front.mako') else: return render('/front-scoutbox.mako')
Then we do whatever we need to do in each branch. In this case, we render whichever frontpage is the one this user should get, but we could put arbitrary code in each branch.
That wouldn't make a very good experiment, would it? You'd give each user different results every time they hit the frontpage. What you want to measure is how users act who get a behavior consistently, so that's what you have to give them. When you call h.branch, it looks up who is logged in, then checks whether they've ever been in this A/B test before. If they have, it returns the same thing it returned last time. If they haven't, then it's random. If nobody is logged in, h.branch uses the session ID instead, and it transfers that session's A/B branches to the user if they log in or sign up from that session.
We decided a while ago which user behaviors we care about. For example, finishing their first Sauce Scout session. Whenever someone does, or whenever they do anything else we care about, we use Mixpanel properties to record which branches of each A/B tests that user is in. Later, we can look at our first-session metric and see how each branch of our experiment is doing. Here's the graph for the frontpage experiment, which hasn't been running very long, and isn't conclusive.
As you can see, I set this graph to only show the conversions from users who are in one branch of the experiment. Some users don't participate in the experiment; that might be because they never went to the frontpage (maybe they came in through an ad or their friends sent them a link). It's probably because we don't run these experiments on all our customers every time. There's an admin control you can use once an A/B test is in the wild to choose what percent of the users get to be guinea pigs. h.branch will then magically return "default" for everyone else, and it won't track their activity for the purposes of the A/B. Later, when there's a clear winner, or when it's clear that there isn't going to be a winner, we'll go tear the h.branch call out of the code and give everyone the winning behavior forever. Or at least until the next experiment! Thanks to rebelcan from reddit for proofreading!
*If you're reading this from the future, I guess it's not true for you that we're running this test "right now," because of how far ahead you are. That should be ok with you, because it's hard to get upset when you have a jetpack.