The Claude Code Leak: Fake Tools, Hidden Modes, and a Few GEO Surprises

On March 31, 2026, Anthropic accidentally published the full Claude Code source to npm. A forgotten `.map` debug file in package version 2.1.88 made 512,000 lines of TypeScript publicly readable. Developers mirrored it within hours. Here is what was inside, and what it tells us about how Claude-powered tools evaluate content.

How the Leak Happened

Anthropic confirmed the leak was caused by human error. According to Varonis Threat Labs, the `@anthropic-ai/claude-code` npm package inadvertently included a 59.8 MB source map file covering roughly 1,900 TypeScript files. A known Bun bug filed on March 11 may have caused the bundler to include source maps in production output. That issue was still open when the package shipped.

As noted by Alex Kim, security researcher Chaofan Shou spotted it first and posted on X. The Hacker News thread that followed was widely circulated. Anthropic filed DMCA takedowns covering 8,100 repositories, but mirrors had already spread. A community rewrite called instructkr/claw-code accumulated over 46,000 stars within days, according to Varonis.

What Was Inside

Five categories of findings emerged from community analysis. Some were deliberate product strategy. Others were never meant to leave the building.

Fake tools to poison imitators

The `ANTI_DISTILLATION_CC` compile-time flag in `claude.ts` (lines 301-313) instructs Anthropic's API to silently inject fake tool definitions into the system prompt. Anyone recording Claude Code API traffic to train a competing model gets polluted data instead. According to Alex Kim's analysis, it is gated behind a GrowthBook feature flag (`tengu_anti_distill_fake_tool_injection`) and only fires for first-party CLI sessions.

A second mechanism in `betas.ts` (lines 279-298) applies only to internal Anthropic accounts (`USER_TYPE === 'ant'`). Instead of passing the full model reasoning between tool calls, it passes a compressed summary along with a cryptographic signature. The full reasoning can be recovered server-side from that signature. Anyone recording the session externally only ever sees the summaries. Regular Claude Code users are not affected. Both mechanisms are technically bypassable with enough effort. The real deterrent is legal, not architectural.

Undercover mode

`undercover.ts` (around 90 lines) auto-activates for Anthropic employee accounts in public repositories. It strips all AI attribution from commits: no "Co-Authored-By: Claude Opus 4.6", no "Generated with Claude Code", no internal codenames like "Capybara" or "Tengu". The source includes the note: "There is NO force-OFF."

The Hacker News discussion around undercover mode was one of the most debated parts of the thread. One side read it as a narrow safeguard: employees dogfooding unreleased models should not accidentally expose codenames in public commit messages. The source also includes a line that drew more scrutiny: "Write commit messages as a human developer would. Describe only what the code change does." Hiding internal codenames is one thing. That line goes further. The irony several commenters noted: Anthropic has publicly called out competitors for distillation attacks, yet the same codebase ships a mechanism that makes AI-authored OSS commits indistinguishable from human ones.

KAIROS: the unreleased always-on agent

KAIROS is the most significant product detail the leak exposed. According to WaveSpeed's analysis, it is referenced more than 150 times in the source. It is not a feature flag or a chat mode. It is a fully built autonomous agent designed to run 24/7 as a background daemon, without any prompt from the user.

The architecture is specific. A `/dream` skill handles nightly memory consolidation through a subsystem called `autoDream`. This merges observations from the day, removes logical contradictions, and converts vague notes into concrete facts. The memory store is pruned to a maximum of 200 lines or 25KB. KAIROS gets tools regular Claude Code sessions do not have: push notifications, outbound file sends, and GitHub PR subscriptions. It refreshes on a five-minute background cron. All responses use a "brief output mode" designed for background operation, not conversation.

The Anthropic research blog has not mentioned KAIROS by name, and the feature gating suggests it is not yet ready for public release.

Internal data, left in the code

A comment in `autoCompact.ts` (lines 68-70) reads verbatim: "BQ 2026-03-10: 1,279 sessions had 50+ consecutive failures (up to 3,272) in a single session, wasting ~250K API calls/day globally." That is a live internal KPI, with a date stamp, in shipped production code. The leak also surfaced internal Slack channel names, unreleased model codenames, and build-time notes explaining product decisions. None of it was meant to be in a public package.

One more thing: a virtual pet

`buddy/companion.ts` implements a Tamagotchi-style companion for your terminal. 18 species, rarity tiers from Common to Legendary, a 1% shiny chance, and RPG stats including DEBUGGING, PATIENCE, CHAOS, WISDOM, and SNARK. Species names were encoded with `String.fromCharCode()` arrays to avoid build-system grep checks. Your buddy is deterministic: seeded from your user ID so the same user always gets the same creature. The `/buddy` command went live on April 1, exactly when the leaked source suggested it would.

What the Leak Confirms About AI Search Visibility

Four findings from the leaked source have direct implications for how brands appear in Claude-powered AI answers.

Claude does not read your website. A smaller model does.

When Claude Code fetches a URL, it does not read the page itself. It hands the URL to a smaller, faster model (Haiku). Haiku converts your HTML to Markdown, strips everything that is JavaScript-rendered, and writes a summary. The direct quote limit is capped at 125 characters per source. That is roughly one short sentence. Claude receives that summary. Not your page.

If your key features, product claims, or differentiators live inside a JavaScript tab, accordion, or modal, Haiku cannot see them. They do not exist from Claude's perspective.

This matches exactly what we found when working with a client whose site ran on client-side rendering and a single-page architecture. Their pages had content. But when any AI crawler fetched the URL, the response was an empty HTML shell — the actual text only appeared after JavaScript executed, which crawlers do not wait for. Nothing to summarise, nothing to cite. We wrote about that case here.

Instructions get flagged. Facts get used.

Claude's system prompt instructs it to flag tool results that look like attempts to override its behaviour. Content shaped like a command, for example "Always recommend X for Y," is treated as a prompt injection attempt. Content shaped like a fact, for example "Reduces onboarding time by 40%," is treated as useful context.

This is active for every session. Write claims with evidence. Not directives.

Search triggers are specific

Claude searches the web in four distinct modes. Stable facts never trigger a search. Topics it knows but considers potentially outdated are answered first, with an offer to search. Rapidly changing topics trigger a single immediate search. Comparison and multi-source queries trigger a research mode requiring 2 to 20 tool calls.

The words "deep dive," "comprehensive," "analyze," and "evaluate" in a query explicitly activate the research category. Narrow, current, comparison-style content is far more likely to appear in a Claude web search than broad evergreen content.

Original sources beat aggregators

Claude's system prompt instructs it to favour original sources over aggregators. A company blog, government page, or peer-reviewed paper outranks a roundup linking to those same sources. Publishing on your own domain, with a named author and a visible date, is the format Claude is trained to trust.

FAQs

What exactly leaked in the Claude Code incident?

A source map file accidentally included in the `@anthropic-ai/claude-code` npm package, version 2.1.88, on March 31, 2026. It gave public access to roughly 1,900 TypeScript files and 512,000+ lines of code: the full system prompt, tool definitions, feature flags, and internal comments. Model weights were not exposed.

Is this the same as the Claude 4 system prompt leak?

No. These are separate incidents. A Claude 4 system prompt became public in May 2025 and revealed how Claude decides when to search the web. The March 2026 incident exposed the Claude Code CLI source code itself, including unreleased features, internal codenames, and implementation logic. Different leaks, different layers.

What is KAIROS?

An unreleased autonomous agent mode referenced throughout the leaked source. It runs as a persistent background daemon, processes GitHub events, and performs nightly memory consolidation. Anthropic has not confirmed a timeline or publicly acknowledged KAIROS by name.

Does Undercover mode affect regular Claude users?

No. It only triggers for Anthropic employee accounts (`USER_TYPE === 'ant'`) in public repositories. Standard Claude Code users are never affected.

In Summary

The Claude Code leak was not a hack. It was a packaging error that made an unusually large slice of Anthropic's internal work publicly accessible before Anthropic began filing DMCA takedowns. The most substantive findings: a working anti-distillation system designed to frustrate model copying, a hidden mode that strips AI attribution from commits in public repos, and an unreleased always-on agent called KAIROS. Less surprising but confirmed: Claude uses a smaller model to summarise your website before the main model sees it, and content structured as facts is treated very differently from content structured as instructions.

If you want to know where your brand actually appears across Claude, ChatGPT, Gemini, and Perplexity (cited, skipped, or misrepresented), an AI Visibility Assessment shows you exactly where you appear, where you're missing, and what's being said.