Which Claude Code skill is best for turning a PRD into a real test plan?

Use test-cases from the myclaude plugin (2,630 stars). Drop in a PRD or user story and it produces a structured matrix covering functional, edge, error, and state-transition scenarios — not a vibes-based checklist. It is the antidote to the most common QA failure mode at the start of a sprint: scanning a five-page spec and trying to remember every category of edge case before the first test gets written.

How do I run end-to-end browser tests without hand-writing Playwright every time?

The webapp-testing skill from Anthropic's skills plugin (126,366 stars) is a Playwright-driven toolkit for interacting with and testing local web applications — frontend verification, UI debugging, screenshot capture, and browser logs. It eliminates the scaffolding, wait conditions, and screenshot plumbing you would otherwise rewrite for every new test file. Use it for any change to interactive UI like auth, checkout, or multi-step forms.

How do I produce ticket-ready bug reports with screenshots and reproduction steps?

Combine qa-only from the gstack plugin (87,020 stars) for the structured report and browser-testing-with-screenshots from the relay plugin (641 stars) for ad-hoc visual evidence on a feature branch. The output is a developer-acceptable ticket with severity tiers, exact reproduction steps, and screenshots — the kind that does not bounce back with 'can you reproduce?' on the first review.

Which skill catches visual regressions and design inconsistencies?

The design-review skill from gstack performs a designer's-eye visual QA pass — finds spacing issues, hierarchy problems, AI-slop patterns, and slow interactions, then iteratively fixes them with before/after screenshot evidence. Use it after major frontend changes, before marketing pages go live, and any time 'it looks broken' arrives in the bug queue without a precise repro. Pair it with webapp-testing for combined visual and functional coverage.

The 7 Best Claude Code Skills for QA Engineers

Q: What is the difference between the gstack qa skill and qa-only skill?

The qa skill runs the test sweep, finds bugs, fixes them in source, commits each fix atomically, and re-verifies with before/after evidence. The qa-only skill does the testing and produces a structured bug report — health score, screenshots, repro steps — but never modifies code. Use qa when you own both sides of the loop, and qa-only when you are the gate (pre-release sign-off, third-party audit, contractor-delivered work).

Q: How do I debug flaky or intermittent test failures methodically?

Use systematic-debugging from the Superpowers plugin (173,826 stars). It forces an observe-hypothesize-isolate-verify process before proposing any fix, which turns a flaky test from 'sometimes red' into a reproducible defect with a known root cause and a one-line fix instead of a try-catch band-aid. Required reading whenever a test fails intermittently or a developer hands back a ticket with 'cannot reproduce.'

QA is the discipline of refusing to take "it works on my machine" as evidence. The job is half investigative — reproducing flaky bugs, narrowing down regressions, writing tickets a developer can actually act on — and half preventative — turning a vague spec into a balanced set of test cases before a single line of feature code ships. The skills below compress both halves of that pipeline. They turn a PRD into a real test plan, drive an actual browser through real user flows, capture evidence that turns a one-line bug report into a reproducible ticket, and keep the verification loop airtight so nothing slips past with a green checkmark it did not earn. All from verified plugins with real commit history and real star counts on GitHub.

From the myclaude plugin (2,630 stars). Generates comprehensive test cases from a PRD, user story, or function spec — happy path, edge cases, error handling, boundary conditions, state transitions. The output is a structured test document, not a vibes-based checklist. For QA engineers, this is the antidote to the most common failure mode at the start of a sprint: opening a blank document, scanning a five-page spec, and trying to remember every category of edge case before the first test gets written.

When to use: at the start of any new feature, before the first test runs. Drop in the PRD, get back a structured matrix covering functional, edge, error, and state-transition scenarios. Pair it with the core review-and-testing skill stack when the same feature also needs developer-side coverage.

From the Anthropic skills plugin (126,366 stars). A Playwright-driven toolkit for interacting with and testing local web applications — verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs. End-to-end browser coverage without hand-writing the Playwright scaffolding, the wait conditions, or the screenshot plumbing every time you start a new test file.

When to use: any change to interactive UI — auth flows, checkout, multi-step forms, dashboards with state. The difference between "unit tests pass" and "the user can actually complete the flow" is where most shipped bugs live, and this is the cheapest way to close that gap. Pair it with capture tooling like capture.thicket.sh when you need a quick standalone screenshot of a deployed page outside the test runner.

From the gstack plugin (87,020 stars). Systematically QA-tests a web application and fixes bugs found — runs the test sweep, finds issues, fixes them in source, commits each fix atomically, and re-verifies with before/after evidence. Three tiers: Quick (critical and high severity only), Standard (adds medium), and Exhaustive (adds cosmetic). Produces before/after health scores and a ship-readiness summary.

When to use: when a feature is "done" and needs a real test pass before it ships, or when a regression has crept in and you need both the bugs found and the fixes landed. The Quick tier is appropriate for a hotfix branch, Standard for a normal feature, Exhaustive before a major release. For a release-gate situation where you only want the report and not the fixes, use the report-only sibling skill instead.

Also from gstack. The report-only sibling of the qa skill: systematically tests a web application and produces a structured bug report — health score, screenshots, repro steps — but never modifies code. Designed for the situations where a QA engineer is the gate, not the fixer: pre-release sign-off, third-party site audits, contractor-delivered work, security review before a merge.

When to use: any time the deliverable is the bug report itself. Output is a structured ticket-ready document with severity tiers, exact reproduction steps, and screenshots — the kind of report that gets accepted by a developer instead of bouncing back with "can you reproduce?" The same gating mindset shows up across the engineering org; backend teams use parallel review skills at the API and data layer.

From the relay plugin (641 stars). Automates Chrome browser interactions, element selection, and screenshot capture for confirming UI functionality — the visual-evidence layer underneath any web QA workflow. Lighter weight than a full Playwright harness, useful when the goal is "prove this button does the right thing" rather than "build a maintained test suite."

When to use: exploratory testing sessions, smoke tests against a staging deploy, ad-hoc verification of a bugfix on a feature branch, or the screenshot-attached half of a bug report. Especially valuable when you are testing a flow that nobody has automated yet and the test budget is one afternoon, not one sprint.

Also from gstack. A designer's-eye visual QA pass — finds visual inconsistency, spacing issues, hierarchy problems, AI-slop patterns, and slow interactions, then iteratively fixes them in source code, committing each fix atomically and re-verifying with before/after screenshots. QA engineers increasingly own the visual-regression beat as well as the functional one; this skill makes that scope manageable without learning a separate visual-diff tooling stack.

When to use: after a major frontend change, before a marketing page or landing page goes live, and any time "it looks broken" arrives in the bug queue without a precise repro. Pair it with the frontend-design plus webapp-testing workflow when the same release needs both the visual sweep and the functional one.

From the flagship superpowers plugin (173,826 stars). Forces a methodical debugging process — observe, hypothesize, isolate, verify — before proposing any fix. Required reading whenever a test fails, a bug report lands, or behavior diverges from the spec. For QA engineers, this is what turns a flaky test from "sometimes red" into a reproducible defect with a known root cause and a one-line fix instead of a try-catch band-aid.

When to use: any time a bug is intermittent, any time the first three theories were wrong, any time a developer hands back a ticket with "cannot reproduce." The skill enforces the pattern that experienced QA engineers do by habit and inexperienced ones skip under deadline pressure.

How to install

Each skill lives inside a plugin. Add the plugin marketplace once, then install with a single command. The skill detail page on Skill Index has the exact install string and a copy button.

If you are new to Claude Code plugins and doing QA work, the highest-ROI first install is the Anthropic skills plugin (you get webapp-testing as the Playwright backbone) plus gstack for the full QA suite (qa, qa-only, design-review) and superpowers for the systematic-debugging and verification layer. Add myclaude the moment a new feature lands in the backlog and you need test cases generated from a PRD. Pair the output with the rest of the thicket toolkit — standalone page screenshots via capture.thicket.sh, mobile-flow QR codes for cross-device test handoff via qr.thicket.sh, and timeboxed test sessions via focus.thicket.sh — and the entire QA pipeline becomes a half-day instead of a week.

How to install

Frequently Asked Questions

More from the Skill Index