What Your Broken Test Suite Is Really Telling You
A Guide for Engineering Leaders on How to Use Automation Pain as a Signal to Rethink Their Quality Strategy
Most engineering leaders care deeply about delivering high-quality software. We invest in tooling, encourage automation, and set expectations around testing. But even with the best intentions, our efforts can quietly backfire.
A failing test suite, endless maintenance work, flaky tests no one trusts, these are more than automation problems. They’re signals. Signals that your quality strategy isn’t working as well as you think.
In past posts, I’ve explored how engineering culture, overloaded language (like what “done” or “tested” means), and unclear automation ownership impact quality. Today, I want to focus on a different angle: what the symptoms are telling us.
These are the patterns I see across teams: brittle frameworks, bloated suites, misaligned tools. More than bad practices these are quality antipatterns, and they come with real costs: slower delivery, decreased trust, and team frustration.
This post is a field guide for recognizing those signs, not just so you can fix your automation, but so you can rethink your quality strategy from the top down.
Let’s dig in.
Maintenance Overhead
Antipattern:
Teams often fall into the trap of “automate everything” without a clear strategy. The result? A bloated, slow, and expensive-to-maintain test suite that takes hours to run and frustrates everyone involved.
Cost:
This overhead drains productivity, demoralizes engineers, and slows down delivery. It’s a classic example of how “more” isn’t always better—especially when “more” isn’t thought through.
Better approach:
Align ahead of time on what to automate and why. Focus on value, not volume.
Brittle Frameworks
Antipattern:
Your tests shouldn’t fall apart every time someone changes a CSS class. But brittle frameworks do exactly that. They’re built on fragile assumptions and tight coupling to UI details, making them unreliable and frustrating. When automation isn’t designed for flexibility, even minor code changes cause widespread test failures. This brittleness undermines confidence in the test suite and leads to “test fatigue,” where teams start ignoring failures.
This is the second most common quality complaint I hear—and it's usually tied to the same root issue: lack of shared ownership and long-term thinking.
Cost:
Brittle frameworks erode trust in automation, increase pressure on testing, and slow down the feedback loop, making it harder to iterate quickly and confidently.
Better approach:
Invest in resilient test design. If a test breaks more often than it catches bugs, it's time to rethink it.
Misaligned Tooling
Antipattern:
Testers and developers using different tools, languages, or repositories create a disconnect. Testers bear the brunt of maintaining automation, while developers remain detached from quality efforts. Testers end up alone on an island, owning tools no one else understands. Engineers can’t (or won’t) contribute. And the rest of the team remains in the dark about test coverage and gaps.
This is inefficient and it erodes collaboration, creating invisible walls in your delivery pipeline.
Cost:
This misalignment leads to poor test coverage visibility, overworked testers, and a fragmented team culture. Quality becomes “someone else’s problem,” rather than a shared responsibility.
Better approach:
Choose tools your whole team can contribute to. Break down the walls between test and code. Read more about how to do that here.
Too Many Tests, Too Little Value
Antipattern:
When teams don’t align on goals, tests pile up, whether they’re useful or not. They often respond to production bugs by adding more tests, believing that quantity equals quality. Bloated test suites become slow, fragile, and overwhelming. And when bugs still make it to production (because they will), the instinct is to write even more tests, instead of asking if the right tests are in place - see the infinite loop we are getting into here?
And the irony is that most of the time, fewer, smarter tests provide better coverage and give testers room to do higher-value work, like exploratory testing, usability checks, and advocating for the customer.
Cost:
Valuable time is wasted on low-impact tests, while high-value testing (like exploratory, usability, or customer advocacy) gets neglected. The cycle of maintenance and brittleness continues.
Better approach:
Focus on coverage that matters, discuss test coverage with the whole team (including PMs), increase the visibility of what is covered, and gather diverse opinions. Don’t trade test quantity for actual quality.
Lack of Ownership
Antipattern:
As with everything else, when no one clearly owns software quality, everyone assumes someone else is responsible for it. Test failures go unnoticed, flaky tests are tolerated, and bugs sit unresolved (not because people don’t care, but because it’s unclear who’s responsible). This is especially common in teams where quality is seen as “QA’s or Testers’ only job” rather than a shared team outcome.
When teams lack a sense of ownership, small quality issues are ignored until they snowball into larger problems, like missed regressions, fragile releases, or broken pipelines that stay red for days.
Cost:
Test failures accumulate. Trust in the automation suite erodes. Engineers hesitate to make changes, fearing they will break something, and quality becomes reactive rather than proactive.
Better approach:
Make quality a team responsibility. Assign clear owners for addressing failing tests and flaky pipelines. Celebrate teams who fix root causes, not just symptoms. Build a culture where raising a quality issue is seen as a leadership move, not a complaint.
Ignoring Technical Debt
Antipattern:
Teams that chase delivery speed at all costs often leave a trail of messy code, outdated tests, and manual processes. “We’ll fix it later” becomes a mantra, but later rarely comes. As debt accumulates, velocity slows, bugs increase, and the system becomes harder to test, change, and deploy.
In test automation, this often shows up as abandoned test files, skipped test cases, or outdated frameworks that no one wants to touch.
Cost:
The codebase becomes fragile. New features take longer to deliver. Testing becomes more difficult and unreliable, and the team spends more time debugging infrastructure than building product.
Better approach:
Treat test infrastructure as product infrastructure. Include automation in hack weeks. Track test-related tech debt in your backlog and prioritize it with the same rigor as product work.
No Feedback Loops
Antipattern:
Teams ship bugs to production, fix them quickly, and move on, without asking why the bug wasn’t caught earlier. When teams skip retrospectives, don’t conduct incident reviews, or fail to act on root causes, the same problems resurface. Quality becomes a game of whack-a-mole instead of a system that learns and improves.
Without feedback loops, testing and automation strategies stagnate. There’s no data to inform better test coverage, no learning from failures, and no alignment on how to prevent future issues.
Cost:
Teams waste time fixing repeat issues instead of improving the system. The same types of bugs slip through again and again. Test coverage doesn't evolve with the product, and quality plateaus.
Better approach:
Build feedback loops into your process. Do postmortems, even for smaller bugs. Use production incidents as a source of truth to refine test coverage. Review bugs in retros to ask: “Should we have caught this? Where? How can we improve?”
Watch for the Drift
Most of these antipatterns don’t happen overnight. They creep in slowly, one skipped retro, one ignored flaky test, one hacky fix at a time. But over weeks and months, they compound. Spotting the signs early (and being willing to course correct) is the real mark of a mature team.
Have you seen these antipatterns on your team? Or others I didn’t mention? Drop your thoughts in the comments or share this with a teammate who cares about quality. Let’s normalize talking about what doesn’t work, so we can get better at what does!
There is an important aspect of this that I rarely if ever see talked about. The thing that we are testing actually places a ceiling on how effective our testing efforts are going to be. Have you ever heard the complaint, "I can't run my tests locally because I need a working version of 'X' in order to test the business logic." where X is some external dependency such as a database, message queue, etc.? Software that suffers from this type of problem lacks testability, and it places limits on tests that no testing technique, tool, or strategy can overcome. The code has to be written in a way that clearly separates business logic from implementation details such as those previously mentioned.
I would also add a nuance to brittle tests. These happen because the setup, verify, and teardown steps of tests are tightly coupled to how the code is implemented - often the UI is the only way to create test data. But what if the developers treated testability as a first-class requirement that is met by providing a dedicated testing API that encapsulates the details of setup, verify, and teardown from the tests themselves? If it is a core part of the software to be testable, then the testing API would be maintained as the implementation details change. With this kind of software, tests are resilient. Failures mean that a requirement is no longer working as expected rather than the more likely being that the test is just flaky.
Wouldn't it be great to test software that has high testability?