Meet Sentinel QA: Agentic Test Automation That Actually Maintains Itself

The hardest part of test automation has never been writing the first test. It's keeping the hundredth one alive a year later.
If you ship UI changes weekly, you already know the shape of this problem. A button moves three pixels, a developer renames a data-testid, a marketing experiment swaps the copy on the hero CTA — and the next CI run lights up red across a suite that was passing yesterday. Nothing is actually broken in production. The tests just don't know where things went. Someone spends Friday afternoon updating selectors, opens a "test maintenance" PR, and the cycle starts over.
Sentinel QA is built for that exact afternoon. It's an agentic test runner: an AI that reads your app, writes tests against it, re-runs them when things change, and tells you which failures are real and which are just the UI moving.
Why traditional test automation breaks
Most end-to-end frameworks — Playwright, Cypress, Selenium — are excellent at running tests. What they ask you to do, before any of that, is freeze the UI's structural identity into your test code: a CSS path, an XPath, a data-testid your team agreed to maintain forever. The framework then bets the rest of its life on that contract.
The contract breaks for predictable reasons. Designers refactor components. A11y improvements rewire the DOM. An A/B test ships and the experiment branch doesn't have the same selectors as control. None of these are bugs. They're normal product work. But because the test layer encoded the structure of yesterday's UI instead of the intent of the test, every one of them generates a false-positive failure.
The industry's stopgap has been more discipline: enforced test-id conventions, page-object models, dedicated QA engineers gatekeeping the design system. That works at scale, sometimes. It also costs roughly the salary of one engineer per twenty engineers shipping UI — and it still doesn't survive a real redesign.
What "agentic" actually means here
When we say Sentinel QA is agentic, we mean three concrete things, not a vibe.
First, the agent reads the live application. It opens the app the way a person would, parses the rendered page, and builds an internal model of what's on screen — the buttons, forms, headings, navigational regions, and their semantic relationships. The model is grounded in what the user can actually interact with, not in implementation-level identifiers that change without warning.
Second, the agent writes the tests. You hand it an intent — "verify a new user can sign up, confirm their email, and reach the onboarding screen" — and it produces an executable test plan. The plan references targets by what they do ("the primary signup button under the hero"), not by where they happen to live in the DOM today.
Third, and most importantly, the agent maintains the tests. When a run fails, it goes back and looks at the page. If the signup button moved from the hero to a sticky header, the agent notices, updates its target, re-runs the assertion, and reports a successful test plus a note about the change. If the signup button is genuinely gone, it reports a real failure with a screenshot of what it actually saw — not a stack trace pointing at a stale selector.
The result is a suite that decays much more slowly than a traditional one, because the brittle layer — the bridge between "what the test means" and "what the DOM looks like right now" — is the part the agent rebuilds on every run.
Who Sentinel QA is for
Sentinel QA is built for engineers who are already doing this work and want their evenings back.
That includes QA engineers and SDETs running large existing suites where maintenance has become the bottleneck — the people who can articulate, in detail, why their last redesign cost six weeks of test churn. It includes platform and DevTools teams looking for a less brittle layer to wire into CI. It includes founders and small-team developers doing their own QA, who don't want to learn an entire selector taxonomy before they can verify the signup flow on a Tuesday.
If you've written Playwright or Cypress and know exactly which corners are sharp, you're the audience. Sentinel QA isn't trying to replace those tools at the runner level — it's trying to replace the part of your job that updates them.
What does Sentinel QA do that Playwright can't?
Playwright executes tests precisely and quickly. What it can't do is decide that a test which is failing because of a DOM change should be repaired automatically instead of marked as a regression. Sentinel QA closes that gap: it treats the test plan as the source of truth and treats the selectors as disposable artifacts it can regenerate from the live page. In practice that means a redesign that would generate dozens of red runs in Playwright produces, in Sentinel QA, a handful of test updates the agent has already made and a much smaller set of genuine failures for a human to look at.
How it fits into an existing pipeline
You don't have to rip out what you have. Sentinel QA is designed to run alongside an existing test suite — on the same CI, against the same staging environments — and to take on the surfaces of your product where churn is highest first. A common path looks like this:
The agent is pointed at one feature area where maintenance is most painful — usually onboarding, checkout, or the dashboard. It generates a covering set of test plans from a few sentences of intent. Those plans run on every PR. As the suite proves itself, more features get migrated, and the brittle legacy tests for those features can be retired. The framework underneath (Playwright, browser drivers, the CI runner) stays exactly where it is. Sentinel QA is a layer on top, not a replacement.
What it deliberately doesn't do
It doesn't pretend to find bugs you didn't ask it to look for. Test generation is grounded in stated intent — the agent will not silently invent assertions about, say, color contrast or copy tone unless those are part of the plan you gave it. We've found this matters: an agent that surprises you with novel "failures" every run is functionally indistinguishable from a flaky suite.
It also doesn't reach into production. Sentinel QA runs against the environments you point it at — local, staging, preview branches — and treats production as read-only by default. If you want production smoke tests, you turn that on explicitly.
And it isn't a chatbot. There's no "ask Sentinel QA what's wrong with my app" interface. The interface is test plans in, signed-off runs out, and a diff of what the agent had to repair to get there.
Where to start
Sentinel QA is open — code, issues, and the project roadmap all live on GitHub. The fastest way in is to clone the repo, point the agent at a small surface area of your app, and see what test plans it generates from a sentence of intent. The README walks through wiring it into an existing Playwright project; if you don't have one, the agent can start from a fresh browser session against a URL.
If the hundredth test in your suite has been quietly dying for the last six months, this is the project you want to look at.
Sentinel QA is an open agentic test runner — code, issues, and roadmap on GitHub: https://github.com/ahn283/sentinel-qa
Frequently Asked Questions
What makes Sentinel QA different from traditional test automation tools like Playwright or Cypress?
Sentinel QA is an agentic test runner that automatically maintains tests by reading the live app, writing tests based on user intent, and updating selectors when the UI changes. Unlike traditional tools that rely on fixed selectors, Sentinel QA repairs tests after UI changes, reducing false positives and maintenance overhead.
How does Sentinel QA handle UI changes that typically break automated tests?
When a UI element moves or changes, Sentinel QA's agent detects the new location or state by parsing the live app, updates the test targets accordingly, re-runs the tests, and distinguishes between real failures and harmless UI shifts. This approach minimizes test suite decay and reduces manual maintenance.
Can Sentinel QA be integrated into an existing test automation pipeline?
Yes, Sentinel QA is designed to run alongside existing test suites and CI pipelines without replacing underlying frameworks like Playwright. Teams can start by applying it to high-churn areas of their app, gradually migrating features to Sentinel QA while retiring brittle legacy tests.
Who is the ideal user for Sentinel QA?
Sentinel QA is ideal for QA engineers, SDETs, platform teams, and developers already familiar with test automation frameworks who want to reduce test maintenance time. It suits those managing large, flaky test suites or small teams seeking easier test creation without complex selector management.
What limitations does Sentinel QA have regarding test scope and environment access?
Sentinel QA only generates tests based on explicit user intent and does not invent new assertions or find bugs outside the defined scope. It runs against specified environments like local or staging by default and does not access production unless explicitly configured. It also does not provide chatbot-style diagnostics.
Continue reading

Meet Pilot: A macOS Agent That Handles Your Repetitive Desktop Work
Pilot is an open macOS agent that clicks, types, and navigates your native apps to finish the repetitive desktop work you keep doing by hand.

Meet Tempy: A Calm Fever Log for Young Children
Tempy is a calm, offline-first fever log for parents of young children. Log once, see the trend, share a clear timeline with your pediatrician.

Meet Linkgo: A Curated Directory of AI Tools, Agents, and MCPs
Linkgo is a curated AI tools directory at linkgo.dev — browse tools, agents, services, MCPs, and models across five operator-reviewed categories with screenshots, pricing, and FAQs.