This is the second of a three part series by Matthew Heusser, software delivery consultant and writer.
When I start to think about testing, I think about it in two broad strokes: new feature testing and release-testing. New feature testing tries to find problems with something new and specific, while release-testing happens after “code complete”, to make sure the whole system works together, that a change here didn’t break something there. Release-testing (which some call regression testing) slows down the pace of release and delays feedback from our customer. Release-testing also increases cycle time - the time from when we begin work on a feature until it hits production. Over time, as our software become more complex, the amount of testing we want to do during release testing goes up.
Meanwhile, teams want to ship more often, to tighten the feedback loop.
Today I am going to talk about making release testing go away - or at least drastically reducing it. It all starts during that tutorial in Spain I wrote about last time.
The frequency of release for the people in my tutorial was very diverse, but two groups really struck me — the telecom that had a four-month test-release cycle, and the Latvian software team with the capability to deploy to production every single day.
That means arriving at the office the morning, looking at automated test runs, and making a decision to deploy. There is a ‘formula’ to make this possible. It sounds simple and easy:
- Automate a large number of checks on every build
- Automate deploy to production
- Continuously monitor traffic and logs for errors
- Build the capability to rollback on failure
That transforms the role of test from doing the “testing we always do” to looking at the risk for a given release, lining it up against several different test strategies, and balancing risk, opportunity, reward, and time invested in release-testing?
The trick is to stop looking at the software as a big box, but instead to see it as a set of components. The classic set of components are large pieces of infrastructure (the configuration of the web server, the connections to the database, search, login, payment) and the things that sit on top of that - product reviews, comments, static html pages, and so on. Develop at least two de-ploy strategies — one for audited and mission-critical systems (essential infrastructure, etc) and another for components and add-ons.
We’ve been doing it for years in large IT organizations, where different systems have different release cycles; the trick is to split up existing systems, so you can recognize and make low-risk changes easier.
This isn’t something I dreamed up; both Zappos and Etsy have to pass PCI audits for financial services, while Zappos is part of Amazon and publicly traded. Both of these organizations have a sophisticated test-deploy process for parts of the application that touch money, and a simpler process for lower-risk changes.
So split off the system into different components that can be tested in isolation. Review the changes (perhaps down to the code level) to consider the impact of the change, and test the appropriate amount.
This can free up developers to make many tiny changes per day as long as those changes are low risk. Bigger changes along a theme can be batched together to save testing time — and might mean we can deploy with still considerably less testing than a ‘full’ site retest.
But How Do We Test It?
A few years ago, the ideal vision of getting away from manual, documented test cases was a single ‘test it’ button combined with a thumbs up or down at the end of an “automated test run.”
If the risk is different for each release, and we are uncomfortable with our automation, then we actually want to run different tests for each release — exactly what thinking testers (indeed, anyone on the team) can do with exploratory testing.
So let the computers provide some automated checks, all the time. Each morning, maybe every half an hour, we get a report, look at the changes, and decide what is the right thing for this re-lease. That might mean full-time exploratory testing of major features for a day or two, it might be emailing the team and asking everyone to spend a half hour testing in production.
This result is grown up software testing, varying the test approach to balance risk with cost.
The first step that I talked about today is separating components and developing a strategy that changes the test effort based on which parts were changed. If the risk is minimal, then deploy it every day. Hey, deploy it every hour.
This formula is not magic. Companies that try it find engineering challenges. The first build/deploy system they write tends to become hard to maintain over time. Done wrong continuous testing creates systematic and organizational risk.
It’s also a hard sell. So let’s talk about ways to change the system to shrink the release-test cycle, deploy more often, and reduce risk. The small improvements we make will stand on their own, not threaten anyway — and allow us to stop at any time and declare victory!
A Component Strategy
When a company like etsy.com says that new programmers commit and push code to production the first day, do they really mean modifications to payment processing, search, or display for all products? Of course not.
Instead, programmers follow a well-written set of directions to … wait for it … add the new user to the static HTML ‘about us’ page that lists all the employees, along with an image. If this change generates a bug, that will probably result in an X over an image the new hire forgot to upload, or maybe, at worst, break a div tag so the page mis-renders.
A bad commit on day one looks like this - not a bungled financial transaction in production.
How much testing should we have for that? Should we retest the whole site?
Let’s say we design the push to production so the ‘push’ only copies HTML and image files to the webserver. The server is never ‘down’, and serves complete pages. After the switch, the new page appears. Do we really need to give it the full monty, the week-long burn down of all that is good and right in testing? Couldn’t the developer try it on a local machine, push to stag-ing, try again, and “just push it?” Questions on how? More to come.
By Matthew Heusser – email@example.com for Sauce Labs
Stay tuned next week for the third part of this mini series! You can follow Matt on Twitter at @.