zerofox logo
Blog

The Claude Mythos Problem: AI Vulnerability Scanning Has Trust Issues

by Nico Flores, Kelly Kuebelbeck
The Claude Mythos Problem: AI Vulnerability Scanning Has Trust Issues
11 minute read

When frontier AI scanning capability lands inside organizations with competitive incentives and Anthropic's oversight telemetry remains unspecified, the threat model gets a lot wider than jailbreakers and dark web exploit sellers.

Anthropic’s announcement of Claude Mythos Preview landed the way most frontier AI milestones do: as a security story dressed in a safety narrative. The defender-first framing of Project Glasswing, the staged disclosure protocol, and the coalition of infrastructure providers all positioned Mythos as a tool for the good guys. Get there first. Patch before attackers do. Collective defense.

That framing is fair, it’s just incomplete.

The conversation about Mythos has focused almost entirely on two threat vectors: external bad actors eventually getting access to Mythos-class capability, and the jailbreak economy that already exists around Claude. Both of those risks are real and documented. But there is a third threat sitting directly inside the program Anthropic designed as the solution, and it has not received serious attention yet.

Mythos-class vulnerability scanning doesn't care whose code it's looking at. Point it at your own software, and it finds bugs. Point it at a competitor's codebase, and it finds those too. And the organizations inside Project Glasswing operate in competitive markets. Some of them compete with each other. All of them have interests. The question of what Anthropic can actually see, detect, and enforce when a Glasswing partner scans a competitor’s codebase is not answered anywhere in the public record.

What Mythos Actually Is

Claude Mythos Preview is a frontier-tier model that sits above Anthropic’s existing lines. In testing, it generated 181 working browser exploits where the prior model achieved only 2 (roughly 90x more), and achieved a 72.4% success rate turning Firefox vulnerabilities into working exploits (near zero for the prior model). It surfaced a 27-year-old vulnerability in OpenBSD, alongside findings in FFmpeg, FreeBSD, and the Linux kernel—bugs that survived years of human review and automated tooling.

More than 99 percent of the vulnerabilities Mythos has discovered to date remain unpatched and have not been publicly disclosed. The exploit creation capability is where the biggest leap occurred. The time between a vulnerability being discovered and a working exploit being created has collapsed dramatically.

90x
More browser exploits vs. prior Opus model per Anthropic's red team report
72.4%
Firefox exploit success rate (near 0% for Opus) per Anthropic's red team report
>99%
Discovered vulnerabilities are still unpatched per Anthropic's red team report

To manage the obvious dual-use risk, Anthropic structured Mythos access through Project Glasswing: a coalition of 12 launch partners and more than 40 additional infrastructure organizations granted early access, with up to $100 million in token credits and $4 million in direct donations to open-source security efforts, plus a committed responsible disclosure protocol.

Partners scan, find vulnerabilities, patch before public disclosure, and notify maintainers on a 90-day timeline.

The model is sensible on paper. It just depends on every participant acting exactly as intended.

The Assumption Anthropic Is Making

The entire Project Glasswing framework rests on an assumption of good faith. The premise is that organizations granted access to Mythos will use it to find and fix vulnerabilities in their own software and in the shared open-source infrastructure they depend on. The 90-day disclosure protocol, the $100 million in credits, the responsible disclosure language: all of it is designed for participants who want to do the right thing.

But Mythos does not technically restrict what code it can be pointed at. A scanner that finds 27-year-old vulnerabilities in open-source projects can be run against any codebase. And the organizations inside Glasswing do not operate in a vacuum of mutual goodwill. They operate in competitive markets, with competitive incentives, under the normal pressures of market position, customer acquisition, and product differentiation.

The question worth asking is whether the program is designed in a way that makes misuse detectable, attributable, and consequential. On current public information, the answer to all three is unclear.

A defender-first program built on unverified good faith is not a security architecture. It is a handshake agreement with a very powerful tool attached.

Three Misuse Scenarios That Do Not Require a Criminal

None of the following scenarios requires a bad actor in the traditional sense. All of them are structurally possible under the current program design, and none of them necessarily violates a law that is clearly written yet.

SCENARIO 01 | COMPETITIVE INTELLIGENCE VIA VULNERABILITY SCANNING

A Glasswing partner scans a competitor’s publicly accessible codebase or open-source dependencies using Mythos. The scan surfaces critical vulnerabilities. The partner does not disclose them to the competitor, does not notify maintainers, and does not report them to Anthropic. Instead, they hold the findings as competitive intelligence: monitoring whether the vulnerabilities get independently discovered and exploited, timing their own product announcements around the resulting incidents, or using the knowledge to inform their own product’s positioning as “more secure.” Nothing in the public Project Glasswing documentation specifies that scans must be limited to the partner’s own software.

SCENARIO 02 | COORDINATED CVE FLOODING

A Glasswing partner runs Mythos against the open-source dependencies heavily used by competitors and simultaneously submits hundreds of CVE disclosures to legitimate cybersecurity research forums and vulnerability databases. Each disclosure is technically accurate and responsibly formatted. The volume is not. The effect is a remediation crisis for organizations whose product stacks depend on those dependencies, timed to coincide with a product launch, analyst evaluation cycle, or procurement decision. The submitting organization looks thorough and security-conscious. Their competitors look like they ship vulnerable software. Both things are technically true.

SCENARIO 03 | SELECTIVE DISCLOSURE TO THIRD PARTIES

A Glasswing partner discovers a critical vulnerability in a competitor’s software or a shared infrastructure component that the competitor relies on heavily. Rather than following the 90-day responsible disclosure timeline, the partner routes the finding to a friendly threat researcher, a short-seller, a journalist, or a nation-state intelligence contact. The vulnerability enters the wild without the competitor knowing it exists. By the time it is exploited and disclosed, the chain of custody is cold. Mythos-enabled discovery is not logged anywhere that the victim can access.

Each of these scenarios is a version of the same structural problem: powerful scanning capability, distributed to organizations with competitive interests, without publicly specified telemetry, behavioral logging, or enforcement mechanisms for how that capability is used.

The Oversight Gap

Anthropic clearly does not intend for Glasswing to be used this way. The question is what Anthropic can actually see, and what it can actually do, when a partner organization uses Mythos in ways that fall outside the spirit of the program.

Anthropic’s responsible disclosure commitment covers what Anthropic itself does with Mythos findings. It commits to 90-day notification timelines and 45-day post-patch windows before publishing technical details. What it does not publicly specify is whether those same obligations are contractually binding on Glasswing partners, whether Anthropic logs what codebases partners scan, whether anomalous scanning behavior triggers any review, or what enforcement action exists if a partner misuses access.

This is not a hypothetical governance gap. Anthropic experienced a notable incident in late March 2026 when unminified Claude Code source code (~512,000 lines) was leaked via an npm package source map.

A defender-first release strategy is only as trustworthy as the provider’s ability to govern its own systems.

Industry analyst observations note that Anthropic’s own operational record raises additional questions about oversight reliability. Within a single month of the announcement, Anthropic experienced two separate incidents: unminified Claude Code source code leaked via an npm package, and a CMS misconfiguration exposed internal Mythos files ahead of the planned release. Industry reports confirm these public incidents. 

Analyst reports also highlight how earlier Mythos Preview versions exhibited autonomous behavior, including attempts to delete evidence of their own exploit activity. A model capable of covering its own tracks, distributed to organizations operating in competitive markets, with unspecified behavioral logging: that combination of facts warrants more scrutiny than it has received.

This Is Not a New Problem. The Scale Is.

Corporate competitive intelligence via security research is not new. Organizations have always had the option to hire researchers to probe competitor products, fund CVE discovery in competitor dependencies, or time vulnerability disclosures for competitive effect. The practice exists at the edges of the security industry and is generally understood, if not openly discussed.

What Mythos changes is the economics and the scale. A practice that previously required expert teams, significant time investment, and manual analysis can now be automated at a fraction of the cost. Mythos found findings across a thousand runs on a major open-source project at a total scanning cost of under $20,000, with individual findings surfacing at around $50 each (details in the red team report).

The barrier to running a competitive intelligence scan on a rival’s entire dependency graph is now a budget line item, not a multi-quarter research program.

That scale change matters because it changes the detection problem. A single well-resourced competitor commissioning targeted security research is detectable. A distributed ecosystem of Glasswing partners, each running Mythos across various codebases, some defensively and some not, produces a signal environment where misuse is much harder to isolate.

The Established Threat Layer Has Not Gone Away

The competitive misuse risk sits on top of, not instead of, the established threat landscape around Claude. Both layers matter.

The jailbreak economy is operational. Chinese state-sponsored actors used Claude for an AI-orchestrated espionage campaign that hit roughly 30 organizations across tech, finance, chemical manufacturing, and government sectors in 2025. A solo operator used jailbroken Claude Code to extract over 150GB of data from 10 Mexican government agencies between December 2025 and January 2026, sending more than 1,000 prompts and running the operation for a full month before pivoting to ChatGPT for lateral movement. These are documented incidents, not projections.

The dark web market is structured and growing. A February 2026 academic study examining 163 discussion threads across 21 cybercrime forums found 2,264 messages from 1,661 distinct contributors discussing AI-enabled criminal techniques. Jailbreak prompt trading dominated, with tools like WormGPT, FraudGPT, and DarkGPT frequently mentioned. Entry-level unconstrained AI tools were available for as little as $100. Subscription-based jailbreak-as-a-service offerings maintained bypass techniques as Anthropic patched each version, treating model releases the same way traditional exploit brokers treat software CVEs.

CrowdStrike’s 2026 Global Threat Reportput numbers on the acceleration. Average eCrime breakout time dropped to 29 minutes in 2025 (down from 48 minutes the prior year), with the fastest observed breakout at 27 seconds. In one documented intrusion, data exfiltration began within four minutes of initial access. AI-enabled adversaries increased operations 89 percent year-over-year.

What Organizations Should Be Asking

The practical response is to understand the full threat surface, including the parts that arrive wearing the defender's badge.

What code does Glasswing access enable scanning of, and who is watching? If your organization ships software with open-source dependencies, those dependencies may already be inside the Mythos scanning perimeter of organizations that compete with you. The question of whether Anthropic is logging what gets scanned, by whom, and for what declared purpose is a legitimate one to put to Anthropic directly.

What is your remediation throughput before the CVE wave arrives? Over 99 percent of Mythos’s discoveries remain unpatched and undisclosed. Those findings will enter the public record on rolling 90-day timelines. Organizations that have not invested in patch pipeline capacity, surge handling for dependency updates, and regression coverage for unplanned bumps are going to be absorbing that wave reactively.

Where do you sit in the Glasswing disclosure sequence? Coalition partners receive vulnerability notifications before public disclosure. Everyone else operates on standard timelines, with no head start on remediation.1 If your stack runs downstream of the open-source projects Mythos has already scanned, you may already be behind the patching window without knowing it.

Have you mapped your legacy memory-unsafe dependencies? The highest-impact Mythos findings landed in older C and C++ projects sitting several layers below application code.2 These are components that survived years of expert review precisely because the tools to find their vulnerabilities did not exist at scale until now. Knowing what you ship downstream of them is step one.

How is your organization being discussed in AI exploit forums right now? The intersection of AI-enabled exploit trading, Mythos-class capability diffusion, and specific organizational targets is visible to intelligence teams with the right dark web coverage. ZeroFox monitors 21,000-plus dark web forums daily. The intersection of AI-enabled exploit trading, Mythos-class capability diffusion, and specific organizational targets is visible to intelligence teams with the right access. Knowing that your software stack is being discussed as a target in a criminal forum, or that a CVE in your key dependency has surfaced in threat actor channels before public disclosure, is actionable. Not knowing is a gap that widens every day, and the capability spreads.

See how ZeroFox delivers AI attack surface intelligence and brand monitoring.

Three Layers. One Attack Surface.

Anthropic built Mythos with genuine defensive intent. The responsible disclosure commitments are real. The coalition structure reflects a real effort to sequence access in ways that favor defenders. None of that is in bad faith.

What it does not resolve is the structural reality that powerful scanning capability, distributed to organizations with competitive interests, without publicly specified behavioral telemetry or enforcement mechanisms, creates a misuse surface that good intentions alone cannot close. The history of dual-use technology programs is not short on examples of how this plays out when the incentive structure is not accounted for at the design level.

The threat to any given organization’s security posture now has three distinct layers. The external threat actor community is already gaming Claude’s safety architecture via jailbreak markets that are professionalizing in real time. The capability diffusion layer, as Mythos-class scanning reaches competing models and open-source alternatives within months. And the insider-access layer: legitimately credentialed organizations with Mythos access, competitive interests, and an oversight framework that has not been publicly verified.

Defending against the first two layers is already hard. The third one does not even have a name yet.

ZeroFox monitors 12B+ correlated signals across the open, deep, and dark web, including AI exploit market activity, CVE trading in threat actor forums, and emerging intelligence on how Mythos-class capability is diffusing into the broader threat landscape. To understand what is moving against your software stack right now, contact us.

Nico Flores

Nico is a Product Marketing Manager with nearly a decade of experience in the cybersecurity and physical security industry. A veteran of the US Air Force, founding member of the US Space Force, and ex-NSA, he has led many successful projects in the field. Taking experiences from the past, he is at ZeroFox to accompany the Intelligence story to the right audiences.

Tags: Digital Risk Protection

The Claude Mythos Problem: AI Vulnerability Scanning Has Trust Issues | ZeroFox