I’ve been deep-diving during the past few months on Quality. Specifically on how we can accelerate the SDLC through means of QA. It’s not news, quality, velocity, and costs balance each other; however, with the revolution of LLMs, cost has become less about the financial burden and more about the focus one. And this is something difficult to grasp for most of us.
In this new landscape, I started making a key distinction when talking about tasks, and quality is no exception. For this article, I would call that distinction Accidental vs. Essential QA (yes, I love Frederick Brooks’ ‘No Silver Bullet’).
Accidental QA should be delegated
Accidental QA is the realm of crash-finding, negative testing, and safety rails. Accidental QA has been fading away for a while; it is where your agent swarms can have fun, and potentially you could go deeper with more deterministic approaches.
Automated exploration & fuzzing.
Meta’s Sapienz runs tens of thousands of UI-level tests daily, surfacing actionable failures fast—proof that random exploration at scale can find real issues before users do. Antithesis is a good example of a continuous reliability platform that autonomously searches for problems in your software within a simulated environment, ideal for server side applications. [5]Static analysis & policy gates
To catch dependency, security, and style regressions automatically—prerequisites that DORA finds correlate with stronger delivery and organizational performance when paired with solid CI/CD. [1]LLM-assisted test coding (not generation).
LLMs are fairly good at generating simple unit tests to check a well-defined set of inputs, if those inputs were defined by you, or if they are instructed to try a large enough set of input/output pairs. However, this is a sore point. Autonomy in these systems has shown that naïve generators bless buggy behavior because their oracles are designed to pass; that means early design choices prevent them from finding real bugs. [6]
So yeah, the light "random walk + guardrails" check, performed by your smoke test layer, or a lazy jr QA intern could be fairly replaceable. We’ve known for a while that CPUs are good at "Try a lot, fail fast, report (somewhat) clearly." If you haven’t, you should automate this aggressively to reduce your change failure rate without burning human cycles…
Waiting for QA should never happen again, but it’s not the actual strategic bottleneck.
Essential QA is your moat
During my early days in the industry, I dreaded when my CEO, VP, director, Staff PM, etc, would try my functionality. Everything worked, everything looked ok, I would argue I would even have done what they told me to do… but it was obviously at fault, because the photo tile was too small, they couldn’t share that document with two different sets of friends without making copies, or any other arbitrary distinction. I did learn fast that it was that eye that made them who they were in the organization, the visionaries.
This is no different with AI code generation.
Birgitta Böckeler’s recent experiment on autonomous AI code generation consistently achieved "well-tested" code [i.e., great Accidental QA with high coverage, unit/integration/E2E suites]; yet the actual functionality diverged from the intent as complexity grew (the examples were fairly small). They also noted E2E suites "can’t cover all test cases" and that gaps in requirements led the AI to filling in assumptions.
If this reminds you of the old days, it’s because this is the old lesson from Frederick Brooks: the hard part of software is the conceptual design and specification, not the syntax. He framed it as essential vs. accidental complexity (surprise!), our tools can reduce the accidental, but the essence (deciding what the system should do) remains stubbornly human. LLMs are good at guessing what you aimed to say, however the context needs to be transferred so that guess is more informed; this is extremely relevant for complex scenarios where multiple answers could be “right”. [7]
Alas, the human challenge!
If LLMs have taken away the need to code these paths, then what you really need is to train your QA Lead – be it a QA professional, a Product Manager, an Engineer or an Ops person – so that they can actually deeply understand what your customers want out of this new feature. That way they would be defining at a higher level those paths your acceptance tests should follow, and those go beyond a static stage of the product.
So if you ask me, it’s time for another extreme shift-left in QA. It’s the time to ensure your organization puts the work in the acceptance criteria; this is the heavy work. Because in the same way that ML became about data cleansing & optimizing loss functions, if coding fades away, the relationship between customer needs & crafting perfectly balanced acceptance criteria will become the talent moat.
The-one-thing: Divide and conquer
Automate the "Accidental QA" layer ruthlessly, and put your best PMs and QA leads next to customers to make acceptance criteria heavy, executable, and non-negotiable. That’s how hopefully quality will become part of your revenue engine, not a cost center.
References
Accelerate State of DevOps Report (DORA) — Google Cloud / DORA Research.
Quantifying GitHub Copilot’s Impact on Productivity — GitHub Research Blog
"How far can we push AI autonomy in code generation?" — Birgitta Böckeler
"Sapienz: Intelligent automated software testing at scale." — Engineering at Meta.
"No Silver Bullet: Essence and Accidents of Software Engineering." — Frederick P. Brooks, Jr. (1987).
"Given-When-Then." — Martin Fowler (BDD / Specification by Example primer).