Section Technology
Claude Code Auto Mode routes risky tool calls through a Sonnet 4.6 classifier instead of endless taps
Anthropic’s March 2026 engineering deep dive frames Auto Mode as permission automation: a two-stage transcript filter plus a prompt-injection probe, built after internal telemetry showed users accepting 93% of manual prompts anyway.

Anthropic’s public engineering note dated 2026-03-25 introduces Auto Mode for Claude Code as a deliberate middle path between noisy per-tool approvals and the nuclear --dangerously-skip-permissions flag: internal telemetry cited in the same article says customers already tap “approve” on roughly 93% of permission prompts, a pattern the team reads as fatigue that erodes attention rather than as proof every action was scrutinized.
Auto Mode therefore targets autonomy with guardrails—letting long sessions continue while model-based classifiers substitute for a human finger on sensitive shell, network, and cross-directory moves.
The architecture is explicitly two-layered. On the way in, a server-side prompt-injection probe inspects tool outputs—file reads, web fetches, shell transcripts, third-party plugin payloads—before they re-enter the agent, adding warnings when content looks like an attempted hijack. On the way out, a transcript classifier running on Sonnet 4.6 evaluates higher-risk actions after cheaper allow tiers clear routine in-repo edits and read-only navigation.
That classifier is deliberately two-stage: a fast single-token “block or allow” pass tuned to err toward blocking, followed by chain-of-thought reasoning only when the first stage raises a flag, so most benign traffic never pays the deeper latency tax.
How allow tiers and the classifier split the work
Tiered defaults described in the engineering write-up mean many coding motions never reach the classifier at all: built-in safe-tool lists, user-configured narrow allow rules, and in-project file writes can proceed without extra model calls, while broad interpreter escapes that granted arbitrary execution under manual mode are stripped when Auto Mode starts so the monitor actually sees dangerous commands.
Anything with real blast radius—shell invocations, web posts, subagent spawns, filesystem touches outside the repo—falls into the transcript classifier bucket, where Anthropic also documents recursive checks at subagent delegation and return to catch privilege expansion and mid-run injection drift.
Honest limits from Anthropic’s own benchmark table
The same March post publishes sobering evaluation numbers on curated datasets: on 10,000 real internal tool calls, stage one alone posted about an 8.5% false-positive rate before the second stage trimmed it to roughly 0.4%, while on 52 real “overeager” incidents the end-to-end pipeline still let through about 17% of dangerous attempts the team had labeled—an explicit warning that Auto Mode is aimed at teams already flirting with fully skipped permissions, not at operators who need human eyes on every production change.
Synthetic exfiltration drills on 1,000 crafted cases showed roughly 5.7% false negatives after both stages, reinforcing that the defense is statistical, not magical.
How practitioners should adopt it without drama
Anthropic pairs the technical detail with operational guidance: treat Auto Mode as a policy object—defaults ship with more than twenty block families covering destructive git moves, credential fishing, silent security downgrades, and pushes across trust boundaries—while encouraging teams to iterate environment slots that declare which GitHub orgs, buckets, and internal APIs count as “inside” versus hostile externals.
Until your org has mirrored those trust boundaries in configuration, the safest mental model is “automation for medium-stakes dev sandboxes first,” then widen coverage as incidents teach the classifier what your codebase considers normal.
Geography and themes
Related places and recurring themes for this story.
Suggested reading
Other stories that pair well with this one—often from the same section or on overlapping themes.
Google I/O 2026 Pushes Always-On Gemini Agent
Google I/O 2026 in Mountain View spotlighted Gemini Spark, described as an always-on personal agent across Workspace and other apps—with user approval before sensitive actions—plus faster Gemini models, agentic Search, and Android XR hardware.
Anthropic’s Q1 2026 growth reads near 80× in markets coverage; Semi Analysis tallies put ARR above $44 billion
Benzinga and syndicated Fortune copy captured chief executive Dario Amodei calling the pace “too hard to handle” around an 80-fold quarterly surge narrative, while a Semi Analysis digest summarized by trade press puts annualized run-rate revenue above $44 billion after a climb from about $9 billion at year-end 2025.
Anthropic buys Stainless, the API-to-SDK toolchain rivals including OpenAI and Google relied on
The 2022 New York startup led by former Stripe engineer Alex Rattray automated libraries across Python, TypeScript, Kotlin, Go, and Java; Anthropic confirms it will wind down hosted products for other vendors while letting past customers keep generated code.
Eric Schmidt booed at University of Arizona commencement when his speech turns to artificial intelligence
Former Google chief executive Eric Schmidt delivered the University of Arizona’s 15 May 2026 commencement address in Tucson, but Business Insider and other outlets reported that parts of the stadium crowd booed whenever he pivoted to AI and automation; he paused to acknowledge the noise, called graduates’ anxieties rational, and argued they should help steer the technology rather than only fear it.
Google CLI Links OpenClaw to Gmail Unsupported
Google's open-source Workspace CLI on GitHub links AI agents including OpenClaw to Gmail and Drive, but the company labels the project unsupported and warns workflows may break as APIs evolve.
Walmart’s six new Onn Android 16 tablets from $97: spec sheet, who they beat, and who should skip them
Launch-day listings describe Android 16 across the stack—from a 7-inch Helio G80 starter through a 13-inch Pro bundle with stylus—but paper wins still need reality checks against Amazon’s Fire line, Lenovo’s budget slabs, and discounted Samsung Tab hardware.
Oakland jury shuts Musk’s OpenAI fight on a clock question, not the ‘betrayed lab’ plot
Nine Northern District jurors agreed the February 2024 filing landed outside the limitations window they were instructed to use; Judge Yvonne Gonzalez Rogers still formalises the advisory result, but the merits of charitable-trust and enrichment theories never went to a second-phase verdict.
Calif’s Mythos-on-M5 kernel exploit story gains an official Apple footnote in macOS Tahoe 26.5 security credits
Calif still narrates seven-day lab work with Memory Integrity Enforcement on macOS 26; Apple’s catalogue page for Tahoe 26.5 now lists CVE-2026-28952 as reported by Calif.io in collaboration with Claude and Anthropic Research—a narrower confirmation than Calif’s full chain narrative but stronger than silence.
Microsoft AI chief’s “12–18 months” white-collar forecast: what Mustafa Suleyman actually said
Headlines that flatten the claim into “every office job disappears in eighteen months” oversell the wording: in a 12 February 2026 interview-based report, Microsoft AI CEO Mustafa Suleyman tied a task-level automation window to desk-bound professional work—lawyers, accountants, project managers, marketers—not a calendar for guaranteed mass layoffs.
Elon Musk vs Sam Altman trial: what the OpenAI fight is about
A federal jury trial in Oakland pits Musk’s breach and unjust-enrichment theories against OpenAI’s account of a co-founder who left governance years before the consumer-AI boom reshaped the company’s capital stack.
Keep exploring
Browse the full archive or return to the front page.
Sources and external links
Sources and filings our editors consulted to verify this story. External links open in a new tab.
- Permission modes — eliminate prompts with Auto Mode (Claude Code docs) (opens in a new tab)— Anthropic (Claude Code docs)