For years “shift left” was a slogan you implemented with a CI job. The security scan ran after you pushed, on a pull request, after the code already existed and the context that produced it was gone. Anthropic’s new security-guidance plugin for Claude Code moves the scan all the way left, into the session, so it runs while the code is still being written. Install it from the marketplace with one line:

/plugin install security-guidance@claude-plugins-official

It’s free, on every plan. And it’s built entirely on hooks: the lifecycle events that fire at fixed points in Claude’s loop. No new command to remember, nothing to invoke. Once it’s on, it just runs.

Three Layers, Priced by Depth

The plugin reviews at three points, each costing more than the last:

  • On every file edit: a PostToolUse hook runs a deterministic pattern match. eval(, os.system, pickle, dangerouslySetInnerHTML, edits under .github/workflows/. No model call, so it adds no cost and works in any directory.
  • At the end of each turn: a Stop hook diffs everything the turn changed and hands it to a background review for the things a string match can’t see: authorization bypass, IDOR, SSRF, weak crypto. Capped at 30 files and three reviews in a row.
  • On each commit Claude makes: a deeper agentic review reads the surrounding callers and sanitizers to decide if a finding is real before reporting it. Capped at 20 per rolling hour.

The cost story is the same one I wrote about in the Opus 4.8 release: the cheap layer is deterministic and free, the deeper layers spend model tokens, and the caps exist precisely because that judgment isn’t free. Cheap checks for the obvious, metered model judgment for the doubt that earns it.

The Reviewer Didn’t Write the Code

Here’s the part worth slowing down for. The plugin does not ask the Claude that wrote the code to grade its own work.

The model-backed reviews run as a separate Claude call with fresh context and a security-only prompt. It starts from the raw diff, has no investment in the original approach, and is told to do one thing: find problems. That structural separation is the entire reason it works. A model reviewing its own output rationalizes; a model handed an anonymous diff and told to break it does not. It’s the same instinct behind a human code review, encoded as a hook.

The author grading their own homework was always the weak link. The plugin’s real trick isn’t detection, it’s handing the diff to someone who didn’t write it and has no reason to defend it.

It Doesn’t Block. Anthropic Says So.

The easy way to describe this plugin is “it stops security holes as they happen.” That’s the promise. It isn’t quite the truth, and the docs are refreshingly blunt about the gap: none of the layers block writes or commits.

What actually happens is softer. A finding reaches the writing Claude as an instruction, Claude addresses it in the conversation, and you watch the fix land. But the review model can miss things, the commit layer only fires on commits Claude makes through its own Bash tool (your shell commits and the ! escape are invisible to it), and nothing is ever halted. It is a fast, early, opinionated first pass, not a gate.

That honesty matters, because the failure mode of a tool like this is the false sense of security it can create. If you read “security plugin installed” as “security handled,” you’ve made yourself less safe, not more.

Who Fixes the Bug, Revisited

I asked a while back, when AI finds the bug, who’s going to fix it. This plugin is one concrete answer. A different agent finds it, the agent that wrote it fixes it, and the whole exchange happens before a human or a PR is ever involved. Anthropic reports 30-40% fewer security-related comments on PRs opened with the plugin on.

But notice what that number is and isn’t. It’s a reduction in volume reaching human reviewers, not an elimination of the need for them. The plugin is explicitly one layer in a stack that still includes on-demand /security-review, PR-time Code Review, and your existing CI scanners. Each later stage catches what the earlier ones missed. Moving the first pass into the loop is genuinely valuable. It just changes where the work happens, not whether it’s needed.

What It Doesn’t Fix

  • It’s not a guarantee. Defense in depth means depth: this is the shallowest, fastest layer, and it’s allowed to be wrong.
  • It only watches Claude. Code you write yourself, or commits you make from your own shell, sail past it untouched.
  • The pattern layer is dumb by design. Deterministic string matching flags .innerHTML = whether or not it’s exploitable. Cheap and noisy is the trade.
  • The smart layers cost tokens. The model reviews bill like any other request, which lands with more weight now that automation is going metered. The caps aren’t just rate limits, they’re budget limits.
Give it your threat model

The model-backed reviews read a .claude/claude-security-guidance.md file if you check one in. Use it to encode the rules a generic scanner can’t know: which routes must call your auth guard, what must never be logged, which comparison must be constant-time. The reviewer loads it as context, so a few lines of your actual threat model beats a thousand generic checks.

Closing

Security review didn’t get smarter this week so much as it changed address. It used to live at the pull request, run by a tool that saw the code cold and a human who saw it late. Now it lives in the loop, run by a second model that reads the diff the moment it exists and a hook that never had to be remembered.

The honest framing is the one Anthropic chose: it reduces what reaches the reviewer, it doesn’t replace the reviewer. Catch issues early, fix them in the session, and keep every later layer exactly where it is. The hook is a good first reader. It was never meant to be the last one.