Earlier this year, we released the second iteration of the Sauce Labs Continuous Testing Benchmark (CTB). Based on real-customer data from the more than 3 billion tests run on our platform, the CTB enables organizations to see how their continuous testing efforts stack up against critical best practices and how their own programs compare to those of other enterprises. The CTB is comprised of four equally weighted performance metrics, including:
Test Quality, for which excellence is defined by passing at least 90% of all tests run
Test Run Time, for which excellence is defined by running tests in an average of two minutes or less
Test Platform Coverage, for which excellence is defined by testing against at least five platforms (desktop) or devices (mobile) on average
Test Efficiency, for which excellence is defined by leveraging at least 75% percent of available test capacity during peak testing periods to drive maximum efficiency and parallelization
These metrics are averaged together to produce an overall CTB score, which ranges from 0-100. For the second consecutive year, the large majority of organizations struggled with at least one if not multiple components of the CTB. In fact, just 7.79% of organizations achieved the benchmark for excellence in all four categories to receive a perfect score of 100. With their expertise, experience, and commitment to continuous learning and improvement, these organizations stand above their peers as the best of the best in continuous testing.
To learn how they do it, in this Q&A blog series, we go in-depth with experts from the top-performing organizations as they share tips, tricks, and best practices for achieving excellence across all four metrics.
First up, we spoke to the Duo Security team at Cisco. Here’s what Kwame Musonda, Engineering Manager - Quality, and Patrick Harmon, Software Developer in Test, had to say about achieving continuous testing excellence.
Q: Why is continuous testing important to the business you support? How do you measure its impact on the business?
Continuous testing at Duo ensures an uninterrupted, bi-weekly release stream of software to production by providing continual quality checkpoints throughout the development process. When you test continuously you develop a culture of early bug detection and shift bug detection further left, all the way back to product design. As we are continuously developing and delivering features, continuous testing fuels our velocity to deliver with quality and confidence. In our organization, well over 99% of our tests are automated, without which continuous integration and delivery would be impossible.
We measure our success in various ways. Through our upstream process controls driven by automation, we measure our bug detection rate, developer productivity, and our ability to keep our main code branch green. As a lagging indicator, we track all bugs that are missed by automation but caught by our release acceptance testing and downstream we analyze bugs found by our customers. We perform Root Cause Analysis (RCA) on all critical bugs.
Q. What are the overall goals and objectives of your continuous testing program? What metrics do you use to measure success?
Our vision is to see software go from a developer to a customer on the same day. We are working towards that vision by implementing Continuous Integration, Continuous Delivery, and Continuous Deployment. This is all predicated on having a robust testing infrastructure
supplemented by observability. We also follow best practices such as deploying features through canary releases, dogfooding and having a rollback plan. Our Key Performance Indicator (KPI) is a measure of our velocity to release. We balance this measure by keeping a close eye on our code coverage and customer-found bugs.
Q. How important is test quality to ensuring that you can meet your testing goals? What are some of the steps you take to ensure that the overwhelming majority of your tests pass?
The quality of our tests is paramount! If tests are not reliable they are practically worthless. If we spend an inordinate amount of time chasing test issues or dealing with infrastructure reliability, then our focus is on the wrong things. Having good test automation coverage can increase one’s ability to iterate and release features to customers knowing you haven’t broken existing functionality. But having "flaky" tests which fail sporadically will have the opposite effect, and will slow down the validation process.
We allocate engineering resources towards tracking, fixing, and in some cases removing "flaky" tests. When tests fail due to a false positive, it not only slows down the continuous integration process, but it can also reduce developer confidence in the tests. Thus, it's an important and ongoing effort to ensure our tests fail for legitimate reasons. Taking a hardline approach may be needed at times. Removing tests is not always considered an option but may be necessary to drive us to find a more reliable solution.
On the flip side, having a test suite that always passes against a codebase that is always changing may indicate you are not testing new features or do not have sufficient coverage, or it may be an indicator of false positives. Our measure of success is when tests fail (and they will fail) the overwhelming majority of the failures are due to bugs or changes in the codebase.
Q: What steps or best practices do you implement to keep the average length of your tests (i.e., test run time) down? How important is keeping tests short to ensuring you meet your continuous integration requirements?
While the average length of our tests is under two minutes, some tests legitimately need to be longer depending on what the test is trying to accomplish. Longer tests tend to have more moving parts, which means there is a higher probability of failure. Overall, we want to modularize our tests to the specific scenarios they cover while keeping a homogenous suite that provides end-to-end functional coverage.
Continuous Integration is only possible because we parallelize the execution of our tests. We run our unit and integration tests in parallel and scale up resources as needed to accelerate testing. Another best practice we follow is running our frontend tests in production. We scale up production instances to increase test parallelization. In addition, we track test durations and optimize on the critical path by reducing the longest-running tests, increasing parallelization, or shifting the tests further left where they can run faster. For example, we may opt for a unit test over an integration test to help speed up testing. Combining our best practices of modular testing, parallelization, and shifting more tests to the left allows us to reduce our overall individual and total test durations.
Q: Why is it important that you prioritize testing across a wide range of browsers, operating systems, and devices?
We need to be as diverse as our customers are in how they use our product. As a regular practice, we tie our customer-usage data to our test platform and environment configurations. If customer usage on a specific version or application falls below a certain threshold, we match our test priorities accordingly. We also have clearly defined User Personas that everyone in the company knows. When someone says “I added this feature for Gary,” the only question is what feature. Everyone knows that Gary represents the System Administrator. Having clear personas helps all groups focus on how to develop and test with our customers in mind.
We also pay particular attention to the variance in behaviors across different platforms. If the architecture differs significantly between platforms, we may try to account for it in our coverage matrix as a separate vector that needs to be tested. It is extremely difficult to test everything, so what’s important is to make data-driven decisions on what to test and perform risk analysis in areas that will not have full coverage. Automation allows us to parameterize our tests. We also leverage partnerships with companies that allow us to further increase our test permutations to better mimic customer behavior.
Q. How important is parallel testing to your efforts to deliver quality releases without sacrificing release velocity? What are some of the best practices you implement to run tests in parallel?
Here are some things to always keep in mind:
Parallel testing is extremely important. However, it can be a challenge for organizations that have not built their testing solutions with concurrency or parallelism in mind. Shared resources and dependency on the order of tests can make parallel testing difficult.
Dealing with shared resources. Not everyone can avoid using shared resources within their tests. Even if one has to use shared resources, there are ways to do it well. First, organize your tests around the resources that they share. There is a natural tendency to organize tests around the features or feature areas they test but you should look for other ways to attach that information to them. Second, make it easy to switch your tests off shared resources, so that if it becomes possible to provision one of these resources for each test at run time, you’ll be in a better position to move. And third, make it easy to understand who or what is executing tests and which shared resources they’re using, such as with automated chat messages or logging.
Perform targeted testing whenever possible. If you organize your tests and have unique identifiers, you can choose how and when to execute your tests. You do not always need to run all your tests. You can target tests to specific changes and reduce your execution time while still being confident in the test coverage achieved.
Leverage the use of dynamic resources. Leverage the use of dynamic resources where possible so that resources can be created and destroyed efficiently when they are needed. Dynamic resources may also indirectly provide additional test coverage. By using API’s, there may be a negligible tradeoff in time compared to having statically allocated resources.
Q: How are you able to achieve enterprise scale while still maintaining best practices with respect to things like test quality, test run time, and test efficiency?
As an organization scales, it is important that the quality culture is adopted and spread by new members. All new hires at Duo attend a quality workshop in their first few months where we train them on what Quality looks like at Duo, and share best practices, discoveries, and approaches that have worked well for us. Afterward, we continue to share these valuable insights and more with them through a monthly Quality Newsletter. The entire organization owns Quality and is empowered to make things better. One of our core values is “Being kinder than necessary in all our interactions” and we also strive to provide psychological safety so everyone is free to speak up and do something about problems that impact them. “Learning together” is another value that helps us openly share our mistakes so we can all learn together.
There will always be unforeseen technical challenges that we discover and figure out how to navigate. We attribute our success in overcoming these to the collective talents of our team.
Q: What advice and/or direction would you give to other organizations looking to achieve continuous testing excellence in the manner that you have? What are the best ways to start improving without sacrificing scale?
An organization's approach towards testing excellence depends a lot on the culture and the level of maturity of the products they develop. There are some things that we feel are foundational to being successful.
Create a collaborative testing culture. The answer to “Who owns Quality?” must unequivocally be that everyone owns quality. If you operate in a ‘throw it over the wall’ culture, you are running on shaky ground and anything you build is likely to crumble. Having a supportive culture where everyone has quality in mind and is empathetic towards your customers (both internal and external) is a prerequisite for implementing test excellence. Excellence in testing is a crowdsourced phenomenon.
Make data-driven decisions. As Peter Drucker said, “If you can’t measure it, you can’t improve it.” Measure the right things and be willing to adapt your approach depending on what the data is telling you. Make these metrics visible so the state of the organization’s quality is known from top to bottom.
Don’t let perfection stop you from being great. Strive for continuous improvement over perfection. Sometimes we hold back from doing something because we are overwhelmed when we think about doing everything. Whether you’re implementing a cultural change, a greenfield automation framework, or improving legacy infrastructure, the ultimate goal is achieved by many small wins.
Architect for the future. The return on investment in automation is almost always positive. However, as decisions are made and details implemented, think about the long-term impact of those decisions. For example, in the build vs. buy decision, your ROI may still be positive if you built it, but can it scale to serve your organization's long term needs? Has someone already solved this problem? Can you leverage or learn from them? Don’t be scared to discard something that isn’t working if it will be beneficial in the long run --- it’s already a sunk cost anyway!
Learning Together. One of Duo’s core values is learning together. We constantly learn and share information. Expand this to the global testing community and grow your world. There are several like-minded specialists who have encountered similar challenges that you can learn from. Expand your knowledge by growing your professional network.
Focus on the right things. As you drive organizational excellence and make data-driven solutions, it is important that you work as one unit towards objectives and key results that the entire organization has bought into. Planning, communicating, and prioritizing the right things to do is critical. Having small wins to build momentum will drive success.