The 8 Best Claude Code Skills for Security Engineers

By Skill Index Editorial · Jun 1, 2026

Security engineering is the discipline of being paranoid on a schedule. Every security engineer already knows that secrets leak into commit history, that a transitive dependency can ship a backdoor, and that a single unparameterised query opens the whole database. The hard part is doing the boring verification consistently — on every PR, every deploy, every dependency bump — without burning out or drowning the team in false positives. The eight skills below address that operational reality: the recurring audit, the pre-merge diff review, the guardrail against the destructive command run at 2am, the disciplined root-cause investigation instead of the reflex patch. Each is a real, verified Claude Code skill with public commit history and a real star count on GitHub.

From the gstack plugin (104,138 stars, MIT, verified — the largest plugin in the index). A Chief Security Officer mode that runs an infrastructure-first security audit: secrets archaeology across commit history, dependency supply-chain analysis, CI/CD pipeline security, LLM and AI-specific security, skill supply-chain scanning, plus OWASP Top 10 coverage and STRIDE threat modeling with active verification. It ships two modes — a daily zero-noise pass gated at 8/10 confidence so it does not cry wolf, and a comprehensive monthly deep scan dropped to a 2/10 bar that surfaces everything worth a human look. It tracks trends across audit runs, so you can see whether the codebase is getting safer or quietly accumulating debt.

When to use: the daily mode in CI on every merge to main; the comprehensive mode on a monthly cadence and before any release that touches auth, payments, or PII. The trend tracking is what turns a one-off audit into a security program — run it on a schedule and the graph becomes the artifact you take to the next review. Pair it with the devops-engineers skill stack for the pipeline and observability half of the same surface.

Also from gstack. A pre-landing PR review that analyses the diff against the base branch for SQL safety, LLM trust-boundary violations, conditional side effects, and other structural issues — the classes of bug a human reviewer skims past when the diff is 400 lines and it is the eleventh review of the day. The two findings it is sharpest on are exactly the two that hurt most in 2026: an unparameterised or string-built query that a SAST tool with no dataflow misses, and untrusted input crossing into an LLM prompt or tool call without a sanitisation boundary. It proactively suggests itself when the user is about to merge.

When to use: on every PR before it lands, especially the ones touching database access, prompt construction, or any code path that consumes external input. Make it a required check rather than an optional courtesy — the value of a diff reviewer is in the PR nobody felt like reviewing carefully. Pair it with cso above so the per-PR review and the whole-repo audit reinforce each other.

Also from gstack. A safety guardrail that warns before rm -rf, DROP TABLE, force-push, git reset --hard, kubectl delete, and similar destructive operations, with a per-command override so it slows you down without getting in your way. For a security engineer the value is the discipline you want active any time you or an agent are operating in a production or shared environment, where the difference between a typo and an incident is one missing flag. The destructive command run against the wrong context is one of the most common self-inflicted outages, and this is the cheapest possible control against it.

When to use: any session that touches prod, a live database, or a shared cluster — turn it on before the first command, not after the first scare. Especially valuable when supervising an agent that has shell access. Pair it with the platform-engineering skill stack for the broader blast-radius controls.

Also from gstack. A systematic debugging workflow with four phases — investigate, analyze, hypothesize, implement — under one Iron Law: no fixes without a confirmed root cause. For incident response and post-mortem work this is the antidote to the most dangerous habit in security engineering, which is patching the symptom (block the IP, rotate the key, restart the pod) and declaring victory while the actual vulnerability stays open. The skill forces the question every incident review is supposed to ask and most skip under time pressure: what is the real cause, and what else does it touch?

When to use: on every security incident, every anomalous log pattern, every "it was working yesterday" that might be a compromise rather than a regression. Resist the reflex to jump to the fix; the phase structure is the point. Pair it with the backend-engineers skill stack when the root cause lives in application logic rather than infrastructure.

Also from gstack. A code-quality dashboard that wraps the project's existing tools — type checker, linter, test runner, dead-code detector, shell linter — into a single weighted 0-to-10 composite score and tracks it over time. Security and code quality are not the same thing, but they correlate hard: dead code is unaudited attack surface, missing types hide injection paths, and a shell linter catches the unquoted variable that becomes a command-injection vector. The dashboard gives a security engineer a fast, defensible read on which modules are well-maintained enough to trust and which are the soft underbelly.

When to use: at the start of any security review to triage where to spend the deep-audit hours, and on a recurring schedule so the score trend warns you before a module rots into a liability. Pair it with cso so the quality score and the security audit point at the same problem areas.

Also from gstack. Restricts Edit and Write operations to a single allowed directory for the session — any change outside the boundary is blocked, not merely warned. For a security engineer this is a scoping control with two uses: during a forensic investigation it prevents an agent from "helpfully" modifying code outside the file under examination and contaminating the evidence, and during a targeted fix it guarantees the change set stays inside the module you intend to touch, so the diff a reviewer sees is the diff you meant to ship.

When to use: the moment you start investigating a suspicious module or applying a scoped security patch — set the boundary before the first edit. Especially valuable when an autonomous agent is doing the editing and you want a hard wall around the blast radius. Pair it with investigate above so the root-cause work happens without collateral changes.

Also from gstack. Post-deploy canary monitoring that watches the live application for console errors, performance regressions, and page failures, takes periodic screenshots, compares them against pre-deploy baselines, and alerts on anomalies. The security angle is detection time: a deploy that introduces a misconfiguration, a leaked debug endpoint, or a broken auth redirect is a vulnerability that exists in production until someone notices. The canary shrinks the window from hours of waiting for a user report to minutes of automated comparison.

When to use: on every production deploy, especially the ones touching auth, headers, CSP, or any security-relevant configuration. Run it as the gate that decides whether a release is promoted or rolled back. Pair it with the qa-engineers skill stack for the pre-deploy half of the same verification loop.

From the cli plugin (425 stars, verified) by Firecrawl. Security guidelines and an enforced workflow for handling web content fetched by the Firecrawl CLI — and the underlying principle generalises far beyond one tool. It treats all fetched web content as untrusted third-party data that may carry indirect prompt-injection payloads, and bakes in the mitigations: file-based output isolation so scraped pages never flow straight into the agent's context, incremental reading with grep and head instead of dumping whole files, and gitignored output so harvested content never lands in a commit. For any security engineer building or reviewing agentic systems, this is the reference pattern for the single newest attack class on the board: prompt injection through retrieved content.

When to use: any time an agent in your stack fetches, scrapes, or ingests external web content — read it as a checklist for your own retrieval pipeline even if you never run Firecrawl. Treat the isolation and incremental-read rules as a baseline threat-model requirement for tool-using agents. Pair it with cso above, whose audit explicitly covers LLM and skill supply-chain security.

How to install

Seven of the eight skills live in the gstack plugin (104,138 stars on garrytan/gstack, MIT, verified — the largest and most active plugin in the index) and one in cli (425 stars, verified, by Firecrawl), so install is a two-marketplace operation — Skill Index has the exact install command on each skill detail page with a copy button. The highest-ROI sequence for a security engineer's first hardened month: wire cso into CI in daily mode so every merge gets a zero-noise audit, then add review as a required pre-landing check so SQL and LLM trust-boundary issues get caught at the diff. Turn on careful in any session that touches prod and freeze the moment you start a forensic investigation. Keep investigate ready for the next incident, run health at the start of each review to triage the soft spots, gate every deploy behind canary, and treat firecrawl-security's isolation rules as the baseline for any agent that touches untrusted web content. Pair the long audit sessions with deep-work blocks via focus.thicket.sh, and the month stops being a string of reactive fire drills and starts being a security program with a trend line you can defend.

Frequently Asked Questions

Which Claude Code skill should a security engineer install first?

Install gstack first and wire its cso (Chief Security Officer) skill into CI in daily mode so every merge to main gets a zero-noise security audit. The gstack plugin has 104,138 stars and an MIT license and is the largest, most active plugin in the index. The cso skill runs an infrastructure-first audit covering secrets archaeology across commit history, dependency supply-chain analysis, CI/CD pipeline security, LLM and AI security, plus OWASP Top 10 and STRIDE threat modeling with active verification. It ships two modes — a daily pass gated at 8/10 confidence so it does not produce false-positive noise, and a comprehensive monthly deep scan dropped to a 2/10 bar that surfaces everything worth a human look — and it tracks trends across runs so you can see whether the codebase is getting safer over time. Run the daily mode on every merge and the comprehensive mode monthly and before any release touching auth, payments, or PII.

What is the best Claude Code skill for security code review?

Use review from the gstack plugin (104,138 stars, MIT, verified). It analyses the diff against the base branch for SQL safety, LLM trust-boundary violations, conditional side effects, and other structural issues — the classes of bug a human reviewer skims past on a 400-line diff late in the day. The two findings it is sharpest on are the two that hurt most in 2026: an unparameterised or string-built query that a SAST tool with no dataflow analysis misses, and untrusted input crossing into an LLM prompt or tool call without a sanitisation boundary. Make it a required pre-landing check rather than an optional courtesy, because the value of a diff reviewer is highest on the PR nobody felt like reviewing carefully. Pair it with cso so the per-PR review and the whole-repo audit reinforce each other.

How do I stop an agent or a teammate from running a destructive command in production?

Use careful from gstack. It warns before rm -rf, DROP TABLE, force-push, git reset --hard, kubectl delete, and similar destructive operations, with a per-command override so it slows you down without blocking legitimate work. For a security engineer the value is not protecting your own laptop; it is the discipline you want active any time you or an autonomous agent are operating in a production or shared environment, where the difference between a typo and an incident is one missing flag. The destructive command run against the wrong context is one of the most common self-inflicted outages, and this guardrail is the cheapest possible control against it. Turn it on before the first command in any session that touches prod, a live database, or a shared cluster — and especially when supervising an agent that has shell access.

What is the best Claude Code skill for security incident response?

Use investigate from gstack. It runs a four-phase workflow — investigate, analyze, hypothesize, implement — under one Iron Law: no fixes without a confirmed root cause. For incident response this is the antidote to the most dangerous habit in security engineering, which is patching the symptom (block the IP, rotate the key, restart the pod) and declaring victory while the actual vulnerability stays open. The phase structure forces the question every incident review is supposed to ask and most skip under time pressure: what is the real cause, and what else does it touch? Run it on every security incident, every anomalous log pattern, and every 'it was working yesterday' that might be a compromise rather than a regression. Pair it with freeze to keep the investigation from contaminating code outside the file under examination.

How do I keep a forensic investigation from accidentally modifying unrelated code?

Use freeze from gstack. It restricts Edit and Write operations to a single allowed directory for the session — any change outside the boundary is blocked, not merely warned. For a security engineer this has two uses: during a forensic investigation it prevents an agent from 'helpfully' modifying code outside the file under examination and contaminating the evidence, and during a targeted fix it guarantees the change set stays inside the module you intend to touch, so the diff a reviewer sees is exactly the diff you meant to ship. Set the boundary the moment you start investigating a suspicious module or applying a scoped security patch, before the first edit. It is especially valuable when an autonomous agent is doing the editing and you want a hard wall around the blast radius.

How do I catch a security regression right after deploying?

Use canary from gstack. It watches the live application for console errors, performance regressions, and page failures, takes periodic screenshots, compares them against pre-deploy baselines, and alerts on anomalies. The security angle is detection time: a deploy that introduces a misconfiguration, a leaked debug endpoint, or a broken auth redirect is a vulnerability that exists in production until someone notices, and the canary shrinks the window between shipping the regression and catching it from hours of waiting for a user report to minutes of automated comparison. Run it on every production deploy, especially the ones touching auth, security headers, CSP, or any security-relevant configuration, and use it as the gate that decides whether a release is promoted or rolled back.

How should I handle untrusted web content fetched by an AI agent?

Use firecrawl-security from the cli plugin (425 stars, verified, by Firecrawl) as your reference pattern. It treats all fetched web content as untrusted third-party data that may carry indirect prompt-injection payloads, and enforces three mitigations: file-based output isolation so scraped pages never flow straight into the agent's context window, incremental reading with grep and head instead of dumping whole files, and gitignored output so harvested content never lands in a commit. For any security engineer building or reviewing agentic systems, this is the reference pattern for the single newest attack class on the board — prompt injection through retrieved content — and the principle generalises far beyond Firecrawl. Read it as a checklist for your own retrieval pipeline even if you never run that specific CLI, and treat the isolation and incremental-read rules as a baseline threat-model requirement for any tool-using agent.

How to install

Frequently Asked Questions

More from the Skill Index