Anthropic's Most Restricted AI Model Got Breached on Day One

Anthropic spent the last two weeks telling the world that Claude Mythos was too powerful to release. On the same day it was announced, a small group in a private Discord forum reportedly got in anyway.

According to a Bloomberg report published Tuesday, unauthorized users have been accessing Mythos regularly since April 7, the day Anthropic first unveiled Project Glasswing — the controlled-release program designed to keep the model out of the wrong hands. The story was subsequently covered by Reuters and TechCrunch, and has since become the central concern of the week in AI security.

Breached on Announcement Day

Mythos is the model Anthropic's own internal testing flagged as capable of autonomously identifying and exploiting software vulnerabilities at a level the company said could enable "dangerous cyberattacks," per Bloomberg. Rather than ship it publicly, Anthropic restricted access to a hand-picked group of partners — reportedly including Apple, AWS, Google, Microsoft, and Nvidia — through Project Glasswing, an initiative explicitly framed as a defensive cybersecurity program.

The premise of that structure was simple. Keep the model behind a very small set of doors. Give each partner access to use it against their own systems so they can find and patch holes before attackers do. Do not let it out.

The breach happened on day one. A group in a private online forum gained access to Mythos on the same day Anthropic publicly announced the release-to-partners plan, Bloomberg's source confirmed with screenshots and a live demonstration of the software. The group has reportedly been using the model regularly ever since — though, according to the source, not for cybersecurity purposes.

How the Group Got In

The method was not a sophisticated hack. TechCrunch summarized what the group actually did: one person in the forum is currently employed at a third-party contractor that works for Anthropic, and that person's legitimate access credentials became the initial foothold. From there, the group used common open-source intelligence tooling — bots scanning unsecured platforms like GitHub for leaked configuration data — to find the rest of what they needed.

The group then "made an educated guess about the model's online location based on knowledge about the format Anthropic has used for other models," per TechCrunch's recap of the Bloomberg reporting. Part of what made that guess plausible was information reportedly surfaced by an earlier supply-chain attack on Mercor, an AI training and data startup that works with several top labs. The New York Post's coverage and the summary circulating across social platforms both tie the details — still-active contractor credentials plus naming-pattern recon — back to the Mercor incident.

The source told Bloomberg the group is "interested in playing around with new models, not wreaking havoc with them," as summarized by TechCrunch. They have not run cybersecurity-related commands on Mythos. They have, according to the reporting, been using it for ordinary tasks — building simple websites, experimenting — partly to avoid tripping detection by Anthropic.

Anthropic's Response

The company has not denied that something happened. "We're investigating a report claiming unauthorized access to Claude Mythos Preview through one of our third-party vendor environments," an Anthropic spokesperson told TechCrunch. The company added that it has found no evidence the activity affected its own core systems, and that the exposure so far appears contained to the third-party vendor environment.

Those are careful statements. They do not contest that an unauthorized group gained access. They contest how far into Anthropic itself that access extends.

Why This Is a Big Deal

This story is not, on its face, a disaster. The users reportedly have not turned Mythos on the banks and infrastructure targets Anthropic's own risk analysis warned about. Bloomberg's follow-on analysis makes clear that regulators and central banks — including those in Canada and the UK — are paying close attention precisely because Mythos is viewed as a genuine uplift to attacker capability against critical infrastructure.

The issue is what the incident proves about controlled releases. Anthropic designed Project Glasswing as a maximally restrictive rollout — the most controlled distribution of a frontier model attempted to date. The perimeter was reportedly broken by someone with legitimate contractor access at one of the partners, plus routine OSINT work. Not a zero-day. Not a novel exploit. Contractor credentials, a previously disclosed supply-chain breach at another vendor, and a smart guess about URL naming conventions.

Layer	What It Was Supposed to Do	What Happened
Project Glasswing partner list	Limit access to a small, vetted group of enterprise security teams	Partner list reportedly spans dozens of companies, expanding the attack surface
Third-party vendor environments	Run Mythos evaluations safely inside partner infrastructure	Contractor credentials inside one vendor became an entry point
URL / endpoint obscurity	Hide the model's live location from outside discovery	Guessable naming conventions plus prior leak data made the endpoint findable

The Controlled-Release Model Is Under Pressure

Controlled-release programs like Glasswing are the policy answer that most AI labs have converged on for dual-use frontier capabilities. The logic: instead of gating everything or opening everything, pick partners, write contracts, manage the perimeter. The Mythos incident is the first high-profile test of whether that approach holds up in practice.

So far the honest answer is: the perimeter is harder to hold than the policy assumes. Every partner adds a supply chain. Every supply chain adds vendors. Every vendor adds contractors. Per Bloomberg, the source told them the group also has access to several other unreleased Anthropic models, and to artifacts from other companies obtained via the same kind of contractor-level sharing. The Mythos breach is the one that made the headlines; it appears not to be an isolated incident.

Anthropic has not yet said what, if any, structural changes it will make to Glasswing. The company's public posture remains that Mythos is too dangerous for general release, which makes the next policy question an uncomfortable one. If restricted release cannot be held against a hobbyist group with a Discord and some contractor credentials, what does "restricted" actually mean for the next frontier model?

The Bottom Line

A small group in a private forum reportedly gained access to Claude Mythos on the same day Anthropic announced Project Glasswing, using legitimate contractor credentials, open-source intelligence tooling, and details surfaced by an earlier supply-chain attack on Mercor. Anthropic says it is investigating and has found no evidence that its own core systems were affected. The group has reportedly not used Mythos for cyberattacks. But the incident is the first public stress test of the controlled-release model that most frontier AI labs are relying on for dual-use capabilities — and on first contact, the perimeter did not hold. The harder policy question is not about this specific breach.

It is whether an access model that depends on dozens of partners, each with their own contractors and vendors, can realistically keep a model with serious offensive potential away from motivated outsiders.

Anthropic's Most Restricted AI Model Got Breached on Day One

Breached on Announcement Day

How the Group Got In

Anthropic's Response

Why This Is a Big Deal

The Controlled-Release Model Is Under Pressure

AI-Generated Content

More from Sonarlink

Claude Mythos and Project Glasswing

OpenAI's ChatGPT Images 2.0 Can Think Before It Draws

AI Swarms and the Democracy Problem