Just like any good software company, we track all kinds of data. Last time we shared some of it with you, you loved it, and that data wasn’t even the good stuff. We thought it was, but it dawned on us recently that we’ve been indirectly been tracking something better. And when we realized where to look, we found something unexpected. In case you haven’t heard, Sauce Labs is a tool for automating real browsers (try it out! it's awesome). We have metadata about millions of browser sessions our customers have used to test their actual websites. As everyone knows, sometimes your software doesn’t work. Maybe it crashes or maybe you had a bug. Selenium testing on Sauce is no different - almost 100% of the time, nothing goes wrong; our reliability in the last few months is at least 99.94%. In fact, as you'll see later, we're now more reliable than modern browsers. But sometimes there’s an error that we think may have been our fault. When there is, we refund the customer and work to fix it. We also record that there was an error. Does that seem significant enough to be italicized? If it does, you’re smarter than we were.
See, sometimes job errors were caused by connectivity, or bugs in our code, or maybe neutrinos from outer space. But some of the time, they were the browser itself crashing. For each error, it would take real investigation to figure out what caused it, and we have thousands of them. But our code and our customers’ code is independent of the browser being tested (which is the whole point of both Sauce OnDemand and Selenium), so if we only look at relative error rates broken down by browser, we can see which browsers are least reliable. Nobody else has this data. This is the *only* statistically significant study of browser reliability on real webpages. Check for yourself - it’s not out there.
Error rates (percent) by browser and version*
The numbers in the graph above are misleading for a some of the browsers. Here’s why:
We stopped supporting Firefox 3.5 a while ago. That means all FF3.5 jobs are from a long time ago, before we’d had much time to streamline and errorproof our system. All browsers would have a higher error rate if we only look at jobs from a long time ago.
IE9, FF4, FF5, and Opera 11 are new. This is the opposite of the FF3.5 issue. All jobs run recently have lower error rates because our service continues to become more reliable as we fix bugs our users discover. These browsers' jobs were all run recently.
This is the same graph as above, with the unfairly advantaged or disadvantaged browsers removed:
Error rates (percent) by browser and version*
If you’re tech savvy, most of these results aren’t very surprising.
IE6 is one of the worst browsers. Each newer IE is slightly more stable but still not good.
Firefox is solid.
Google Chrome is the big winner overall. They force you to update to their newest version every session, so their error rate is an average across all their versions, but it’s still significantly better than even the newest versions of the other browsers.
Opera is fine.
The shocker is Safari story. Safari 3 - the oldest version of safari - is extremely reliable. Safari 4 is a good deal above average. And then there’s Safari 5.
Safari 5, the latest in browser technology from the most valuable company in the world, is by far the worst on the market. Go have another look at that number - it’s almost twice as bad as second-worst, the oft-maligned IE6. And that comparison is unfair to IE6. See, Safari 5 was released recently, like Opera 11, IE9, and FF5. Those are the browsers whose error rates were unfairly good. Like them, all Safari 5 jobs were run on our newer, ultra-stable OnDemand infrastructure. We should expect it to have an extremely *low* error rate, like they do, but instead its error rate is ten times worse. At first we thought the high error rate could be the result of the fact that we always run Safari on Windows, while it's made by Apple. That's easy to dismiss, because earlier versions of Safari were fine. Then we explored the possibility that the errors were caused by always running Safari 5 in proxy mode (for arcane Selenium reasons). So we looked at the average error rate for non-Safari jobs run in proxy mode, and it put things back into perspective.
Error rates (percent) by browser and version*
These are the error rates for browsers running in proxy mode (we don't have enough data for Opera). Notice the new scale on the Y axis.
As you can see, Firefox actually seems to perform better in proxy mode, so we can't say that proxy mode is always worse. Chrome is unaffected. Safari 5 is still the worst Safari, but it's no longer a huge outlier overall. IE7, the best IE in proxy mode, is on par with the worst Firefox, 3.6. IE8 is surprisingly much worse than IE7. And the king of being a bad browser, once again, is IE6. Hail to the king, baby.
*Percents are 1 - lower bound 95% confidence Wilson score for success rate