A last thought on Mythos: How much is Marketing and how much is reality?

Eagle-eyed readers might have caught the “claimed to be” qualification in my prior article (The Mythos Cascade). I put it there for a very good reason: the hype, mystery and FOMO created by Anthropic feels like a masterclass in marketing. You can’t miss out on Mythos, because everyone fears Mythos and therefore demands security from Mythos. Mythos’ actual capabilities are secondary to the story and the people evangelizing it. As my brilliant technologist friend, Ram C Singh , called it, it is the perfect Glomar* Trap.

Let’s dive in and pressure test some of the key claims.

Anthropic wasn't first, and isn't the Only Game in Town

The framing around Mythos treats autonomous vulnerability discovery as a watershed: the first AI model that is so capable that it finds decade old bugs and can build exploits for them in a day. Sadly (for Anthropic), Google gets that credit from when DeepMind (using Gemini 1.5 Pro) found a 20-year-old exploitable bug in SQLite in late 2024. That same year, researchers at the University of Illinois showed that GPT-4 could autonomously exploit 87% of known real-world vulnerabilities when given a CVE description, compared to 0% for every other model and open-source scanner tested.

Since mid-2025, AISLE (a model-agnostic autonomous security analyzer) has found 15 CVEs (some with a near-perfect 9.8 out of 10 CVSS criticality score) in OpenSSL, with bugs dating back over 25 years. Beyond OpenSSL, it has found over 180 externally validated CVEs across 30+ infrastructure projects using a mixture of LLM models.

Just last week, Xint AI and Strig AI found exploitable bugs in the most popular software on Earth: the Linux kernel and Apache web server. To keep the pain train going, GPT-5.5 matched Mythos in cybersecurity testing.

The difference with Mythos seems to be in its autonomy and the accuracy of the results. While the various LLMs can find bugs, their relevance and exploitability tend to be low or require human intervention. Some application owners (e.g., curl) have shut down or publicly debated suspending their bug bounty programs due to the overly high ratio of AI-generated noise in the submissions. By being able to generate a working exploit, Claude automatically validates it has found an issue worth reporting.

Conclusion: Mythos is most definitely not the first or the only LLM to find and exploit zero-days, but its approach to proving it has found a real security issue stands alone.

Zero Days aren't the Only Way In

The conversation since Mythos’ announcement has focused almost entirely on zero-day exploits: who finds them first, who patches them fastest, and whether Glasswing’s head start is long enough.

While that conversation was happening, Anthropic itself had two significant security lapses in the same week. First, roughly 3,000 internal files were left in a publicly accessible data store and five days later, a debug map file accidentally bundled into a routine Claude Code update exposed its proprietary source code. Neither incident involved a zero-day; both were caused by humans making ordinary mistakes.

To add insult to injury, Bloomberg has validated reports that a group of non-authorized individuals have gained access to Mythos by guessing the URL and (mis-)using credentials from a launch partner. Had they not bragged to Bloomberg, none of this would’ve been known and proves Mythos’ access restrictions are laughably weak for a program that portends cybersecurity Armageddon.

Conclusion: Mythos can find vulnerabilities in code but it cannot stop a developer from leaving a map file in a build configuration, an admin from misconfiguring a CMS or an architect from designing easily bypassed authorization. It does not protect against the insider threat, the credential phish, the contractor with too much access, etc. These are the vectors that cause the majority of real breaches, and they exist completely outside the problem Mythos is designed to solve.

Even if Anthropic is the Best Right Now, That Won't Last

History offers a consistent pattern: once a capability benchmark is demonstrated, competitors reach it faster than pre-announcement estimates predicted. AlphaFold, GPT-4-level reasoning and autonomous code generation all followed that curve. Autonomous exploit chaining is unlikely to be different, especially now that we have proof that threshold is crossable.

GPT-5.5 has already challenged Mythos on AISI’s “capture the flag” and data extraction challenges. It can’t build exploits yet, but that capability can’t be far behind.

At an estimated cost of $20k per the type of scan needed to find the 10- and 20-year-old bugs that Anthropic likes to boast about, it’s also not cheap. These high costs (5x higher than Opus 4.6, Anthropic’s most expensive, current model) are also likely to attract competition, looking to capture a part of the lucrative market before prices come down.

Conclusion: Even if Anthropic is exaggerating, Mythos will have competition that will deliver on its promises sooner rather than later.

So, are the claims real, myth and does it even matter?

Probably all three. Mythos is unproven, limited and expensive. Its novel approach to validation and its CVE-to-working-exploit cycle compression are valid concerns. And no, it doesn’t matter as even if Mythos is vaporware, the functionality it promises will certainly be a reality (and soon)

Mythos is likely to be Anthropic’s cash cow until the competition catches up and drives costs down. At that point, SAST, DAST, IAST, etc. will be replaced with “AIST” as the de facto way of securing code throughout its lifecycle. Unlike Glomar and deep-sea mining, the economics make sense today (for the largest enterprises) and will only continue to improve, becoming accessible to a wider audience. Until then, and as I advised last week, inventory your codebase, compress your patch cycles and fund your security teams.

*For those unfamiliar with the reference, the Glomar Explorer was a ship built by Howard Hughes under the auspices of mining the ocean floor (it was really to recover a sunken Soviet submarine). The press aggrandized the effort, interests in deep sea mining skyrocketed and the stocks of companies related to the topic shot up. Competition increased but in the end, large scale sea mining never took off because the economics simply weren’t there, but niches for the technology were found and persist to this day.