How Claude Mythos Finds Zero-Days: AI Vulnerability Discovery Explained | webencher

Inside Claude Mythos: How Anthropic's AI Found Thousands of Zero-Days Across Every Major OS

Claude Mythos didn't just score well on benchmarks — it autonomously discovered thousands of critical vulnerabilities in every major operating system and browser. This deep dive explains how it works: from hypothesizing bugs in code to chaining four exploits into a full sandbox escape.

A new kind of capability

Most AI models get better at benchmarks gradually. Claude Mythos Preview didn't — it made a discontinuous jump in one specific domain: software security.

On CyberGym, the leading benchmark for AI-assisted vulnerability reproduction, Mythos scored 83.1%. Claude Opus 4.6 scored 66.6%. That 16.5-point gap represents a qualitative difference, not a quantitative one: Mythos converts 72.4% of identified vulnerabilities into working exploits. Previous Claude models consistently failed at exploit development entirely.

More critically: Mythos discovered these vulnerabilities autonomously, with no human guidance after the initial setup.

How the system works

Anthropic deployed Mythos inside a custom agentic scaffold — a containerized testing environment with:

File-level parallelization: separate agents assigned to different source files simultaneously
Pre-ranked vulnerability likelihood: each file scored 1–5 for likely bug density before analysis
Secondary validation agent: filters trivial or duplicate findings before human review
Tool access: debuggers, sanitizers, fuzzing harnesses, and exploit development utilities

The model's workflow for each target roughly follows this pattern:

Code inspection — read source files, build a mental model of the codebase
Hypothesis generation — identify data flows that could lead to memory corruption, logic errors, or authentication bypasses
Experiment design — write test cases or crafted inputs to confirm the hypothesis
Debugging — use sanitizers (ASan, UBSan) and debuggers to observe program state
Exploit development — turn a confirmed bug into a working proof-of-concept

This is exactly what a skilled human security researcher does. Mythos just does it faster, cheaper, and in parallel.

The discoveries: specific examples

1. OpenBSD TCP SACK — 27-year-old crash bug

The vulnerability was introduced in OpenBSD's TCP stack in 1998, when the original developer implemented TCP Selective Acknowledgement (SACK) support.

Root cause: A signed integer overflow in the sequence number comparison routine, combined with improper bounds checking in the SACK window management code. TCP sequence numbers are 32-bit unsigned integers that wrap around — when Mythos examined the comparison logic, it identified that the code treated them as signed in one critical path, creating an arithmetic overflow when sequence numbers wrapped past 2³¹.

Impact: Sending a specially crafted sequence of TCP packets causes a remote machine crash — a denial-of-service attack requiring no authentication.

Why it survived 27 years: The interaction between SACK window management and 32-bit integer wraparound only manifests under specific conditions. Automated fuzzers generate random inputs; they rarely produce the precise sequence needed to trigger this exact integer boundary. Mythos reasoned about the code semantics and constructed the trigger deliberately.

2. FFmpeg — 16-year-old bug missed by 5 million tests

FFmpeg is the open-source multimedia library powering YouTube, VLC, and thousands of other applications. It has been continuously fuzz-tested since at least 2016; OSS-Fuzz has run over 5 million test cases against it.

Mythos found a bug that all of them missed.

Root cause: A logic error in a codec demuxer's timestamp normalization function, introduced approximately 16 years ago. The bug doesn't cause a crash — it causes incorrect behaviour under specific codec configurations that no automated test happened to exercise.

Why fuzzers missed it: Traditional fuzzing is coverage-guided. It generates inputs that exercise new code paths. This bug sits in code that is frequently executed — the path wasn't new — but only triggers when two specific conditions hold simultaneously: a particular codec type combined with a specific timestamp arithmetic edge case. Mythos identified both conditions by reading the code.

3. FreeBSD NFS — Remote Code Execution as Root (CVE-2026-4747)

This is the most severe finding: a 17-year-old stack buffer overflow in FreeBSD's implementation of RFC 2203 RPCSEC_GSS authentication, enabling unauthenticated remote code execution as root.

Technical details:

A 304-byte attacker-controlled string overflows a 128-byte stack buffer
The overflow target lacks stack canaries in the affected compilation unit
Mythos developed a 20-gadget ROP (Return-Oriented Programming) chain, splitting it across six sequential RPC packets to stay under per-packet size limits
Kernel base address is leaked via an unauthenticated NFSv4 EXCHANGE_ID call, defeating ASLR

The complete exploit — from initial reconnaissance to root shell — required no human input and cost under $50 in compute at Mythos's pricing.

"Mythos Preview fully autonomously identified and then exploited a 17-year-old remote code execution vulnerability in FreeBSD that allows anyone to gain root on a machine running NFS." — Anthropic red team report

4. Firefox browser — Four-vulnerability sandbox escape chain

The most technically sophisticated finding involved chaining four separate vulnerabilities in Firefox's JavaScript JIT compiler to escape both the renderer process sandbox and the OS-level sandbox.

The chain:

JIT read primitive — exploit a type confusion in the JS engine to read arbitrary memory
Heap spray — use the read primitive to locate kernel objects
JIT write primitive — a second JIT vulnerability allowing controlled writes
Credential manipulation — overwrite kernel process credentials to escalate privileges

Each individual bug is rated medium severity in isolation. Chained together, they produce a full compromise of the browser host from a malicious web page. Mythos discovered all four and constructed the chain autonomously.

Scale of findings

Across Anthropic's research period:

Thousands of high- and critical-severity vulnerabilities found across all major OSes and browsers
595 crashes at severity tiers 1–2 on the OSS-Fuzz corpus vs. Opus 4.6's 150–175
181 working exploits developed for Firefox's JS engine (Opus 4.6: 2)
29 register-control instances — a stage required for reliable RCE exploit construction
89% severity accuracy: of 198 manually reviewed reports, 89% matched Mythos's own severity assessment exactly; 98% were within one severity level

What Mythos can't do (yet)

Anthropic was careful to note capabilities that remain limited:

Logic bugs in web applications: authentication bypasses and broken authorization in web apps remain harder for the model than memory-corruption bugs
Cryptographic vulnerabilities: subtle flaws in cryptographic implementations require deep mathematical reasoning that current models handle inconsistently
Novel attack techniques: the model primarily applies known classes of vulnerabilities; inventing genuinely new exploit primitives is rare

The implications

The emergence capability is the most unsettling aspect of this announcement:

"We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy."

This means vulnerability discovery isn't a feature Anthropic built — it's a consequence of building a smarter, more capable general-purpose model. As models continue to improve, these capabilities will deepen whether or not any lab intends them to.

Anthropic's response is to give defenders a head start. Whether a six-week window before similar capabilities reach open-source models is enough time to patch critical software is a question the security community is now urgently debating.