For the past 20 years, software testing has been considered an engineering issue. Companies would typically hire more engineers to write, automate, and maintain more scripts. But that’s not sustainable today. AI coding tools didn’t just make everything faster, it broke the entire process.
It’s easy to understand why, but much harder to correct. AI coding assistants decouple code creation from code validation. An engineer who previously shipped one feature a week can now prototype five or six. The code keeps coming. Traditional tests with a human in the loop cannot keep up.
The issue isn't primarily a lack of resources, but rather a flaw in the underlying architecture.
The scariest part of this conundrum is that no single catastrophic failure presents itself as an aha moment for most teams. Instead, there is usually a slow burn of problems, manifesting in increased, frequent issues in production. The final straw typically manifests in burnt-out engineers and leaders who cannot confidently answer a simple question: Is this code trustworthy when it ships?
Script-based automation requires engineers to translate human intent, for example, "a user should be able to check out successfully," into executable code. That translation step was always flaky and expensive. And it results in testing scripts that are so tightly embedded within implementation details, the smallest UI change, every refactor, and dependency update can (and often does) break dozens of logically sound and correct tests. That is more than just a burden; it’s the growing maintenance tax we talk about today that can consume roughly 40% of QA effort — not time spent on new coverage for new features, but just keeping old scripts alive.
Moreover, it requires coding expertise, which excludes the people who often understand business requirements most deeply, such as product managers, business analysts, or customer success and support teams.
End-to-end coverage almost never passes 35% for complex user journeys. Not because teams don't care, but because the paradigm makes it impossible to capture them or scale.
We've been paying a translation tax on top of the quality tax, and we've mislabeled it as engineering rigor.
Intent-driven testing starts from a different premise. Instead of trying to make old processes faster, it asks developers, product managers, and business analysts to describe what the application is supposed to do. The executable implementation follows from that description rather than being authored manually to stay in parallel with it.
This is not simply a UX improvement on existing automation tools. It's a different theory of what testing is.
In the script-based model, tests are a byproduct of engineering effort — something you produce after the real work is done, and maintained indefinitely as a tax on future work. In the intent-driven model, tests are a direct expression of current business requirements. The artifact that a product manager writes to describe a user journey becomes, without a separate translation step, the thing that verifies whether the software does what it's supposed to do.
The implications of this shift are significant:
It closes the expertise gap. The people with the deepest knowledge of what software should do (not how it's built, but what it's for) can now contribute directly to testing coverage. A QA lead who typically spends half their week on maintenance can instead use that time to evaluate judgment calls that require expertise, such as edge cases, ambiguous requirements, and failure modes that aren't obvious from a happy path description.
It changes the maintenance model. Tests that are coupled to intent rather than implementation are more durable. When the UI changes, the intent hasn't. When a component is refactored, the business requirement remains the same. Coverage can be self-correcting in a way that manually authored scripts structurally cannot be.
It scales with AI-generated code. If code generation is increasingly automated, the only sustainable verification layer is one that can also operate at scale without linear headcount growth. Intent-driven testing is the only model where the input (a description of desired behavior) doesn't itself require engineering expertise to produce.
And finally, it becomes agnostic to factors such as operating systems, devices, and browsers, which is how it should always have been.
Intent-driven testing is a compelling idea that has been partially attempted before, and those attempts have mostly failed in predictable ways. General-purpose AI tools can generate syntactically correct tests from natural language descriptions, but those tests tend to be semantically fragile, brittle to application changes, and prone to false positives. These tools are unreliable as a foundation for release decisions.
For the paradigm shift to be real rather than rhetorical, a few things have to be true.
The intelligence layer has to understand application context, not just syntax. Tests generated from intent need to reflect how the application actually behaves — which requires deep integration with the application under test, not just pattern-matching on the description.
The system has to improve over time. A static translation of intent into tests has the same maintenance problem as manually authored scripts, just with a different author. An enduring system learns from production behavior, from test results, from application changes — and continuously updates coverage accordingly.
Human oversight has to remain meaningful, but what exactly they are responsible for is evolving considerably. Tests can be generated and executed autonomously; that’s nothing special today. We still need humans to supervise to make critical judgment calls regarding acceptable risk, and they need to step in when something requires escalation. Any architecture that tries to automate those judgments away will fail to earn enterprise trust.
Twenty years ago, our founders built Selenium because there was a gap between how software was being written and how it was being verified. That gap narrowed for a while. AI has blown it back open.
The teams that will navigate this well aren't the ones that hire faster or dig old trenches deeper. They're the ones that are willing to ask a harder question: are we solving the right problem?
The quality tax isn't primarily a resourcing problem. It's a paradigm problem. And paradigm problems don't yield to incremental solutions.
The era of describing intent and letting intelligent systems handle the rest is worth taking seriously. Not because it's the next tool in the stack, but because it represents a fundamentally different theory of what the relationship between building software and verifying it should look like.
That shift is overdue.