Testing in production is essential if you want to test software as rigorously as possible.
Why? While testing early in the pipeline (i.e., shift-left testing) is necessary and highly encouraged, it's simply not enough on its own. Companies practicing agile testing methodologies and building a disposable infrastructure are ready to perform testing in production, which is sometimes called shift-right testing.
By testing in production, you build another level of confidence in releases after performing various checks in a live production environment. Testing in production allows the company to see how an application reacts to newly pushed code changes in the wild. It should become a significant component of your future application quality strategy going forward.
Below, I explain why it’s important to test in production, then offer tips for developing a shift-right testing strategy.
Ask Yourself: What Are the Advantages of Testing in Production?
The big difference here is changing the norm by extending the continuous testing feedback loop from live production data and real user traffic while performing tests in production. You will find a collection of defect bugs not found from other testing environment activities (dev, staging, or pre-prod). The collected defect bugs from production will help the development team isolate every bug gathered to improve application quality, thereby providing a better customer experience.
It will start to encourage and empower developers and software developers in test (SDET) to test more early on, and in production. The goal is to elevate quality by building various quality guards around the application.
Here are more advantages of testing in a production environment:
- Beta programs where customers provide early feedback on new features and user experience.
- Prevent disasters with better resilience and recovery testing. The application can recover from expected (chaos) or unexpected events without loss of functionality and data.
- Design and build a disaster recovery process to unleash chaos in a pre-production environment before performing in a live production environment. (Now we are developing a robust and quality application.)
- You are testing with production data. (It's hard to replicate production traffic and data, making it difficult to detect every possible scenario.)
- It will eliminate the risk of frequent deployments on the production environment when performed on a daily basis, while you monitor application performance in real time with tools like New Relic.
What Are the Risks?
The risk ultimately hinges on the infrastructure design. Was it well thought-out? Is it repeatable and disposable? If not, any of these possible outcomes could occur with or without testing in production:
- Keeping the application live and running
- No backup plan for your application runs the risk of loss of data
- No rollback plan when a release goes sideways
- Exposing potential vulnerabilities
- Unable to recover from unexpected chaos
- Timing of testing causing a bad user experience
Take a moment to understand the following DevOps terminology: Pets vs. Cattle. To eliminate any possible risk when testing in production, your application infrastructure needs to be Cattle, not Pets. The key point is to have a repeatable and disposable infrastructure (using Chef, Ansible, Puppet, or Docker) to handle any of the possible scenarios above.
Potential Testing in Production Tactics
The goal of testing is to prevent bugs from being deployed to production. Finding an issue after the application already deployed is too late. In my opinion, nothing changes with an existing testing strategy where everyone on the team owns quality. We should continue to shift left with testing at every stage of the pipeline that enables faster feedback from testing and integrating code. It allows teams to find problems sooner rather than later. Testing in production is just another quality guard around your application. The new tactic of testing in production is an important piece of your testing strategy to deliver quality apps to customers. I would break it down into three sections—deployment strategies, testing methods in production, and monitoring:
- Blue-Green Deployment
- Canary Testing
- A/B Testing
- Automated Rollback Strategy
Testing in Production Methods
- New Relic Synthetic User Testing
- Lightweight User Acceptance Testing
- Infrastructure Integration Testing
- Visual Testing with Applitools
- Disaster Recovery Testing
- Application Performance in Real Time with New Relic
- Alerts Policies
Testing in production should be part of a well-designed, scalable and highly resilient testing routine. To deliver top-notch software like Google, Facebook, Amazon, and other giants, the traditional strategies won’t cut it. We should continue testing early and often, and consider shift-right testing as part of our testing strategy.
Greg is a Fixate IO Contributor and a Senior Engineer at Gannett | USA Today Network, responsible for test automation solutions, test coverage (from unit to end-to-end), and continuous integration across all Gannett | USA Today Network products.
In the last two years, he has helped change the testing approach from manual to automated testing across several products at Gannett | USA Today Network. To determine improvements and testing gaps, he conducted a face-to-face interview survey process to understand all the product development and deployment processes, testing strategies, and tooling. He provides a formal training program for teams still performing manual testing that allows them to transition to automated testing.