Claude Opus 4.7: The Best Public Coding Model — And the First Built with the Brakes On

I’m going to be direct: Opus 4.7 is the best coding model generally available on the market right now. It’s not an opinion — it’s the benchmarks. But what makes this release interesting isn’t just the numbers. It’s what Anthropic chose not to put inside it.

Yesterday Anthropic launched Claude Opus 4.7, and the results in software engineering are the highest any public model has achieved. But this model exists in a very specific context: nine days ago, Anthropic announced Project Glasswing and revealed that Claude Mythos Preview — their most powerful model — is so capable in offensive cybersecurity that they decided not to release it to the public. Opus 4.7 is the first model built with that lesson incorporated.

This is what matters.


The Numbers That Matter

On SWE-bench Pro — the benchmark that measures the ability to solve real GitHub issues across multiple languages — Opus 4.7 reaches 64.3%. Opus 4.6 was at 53.4%. That’s a jump of nearly 11 points in a single version. For context: GPT-5.4 is at 57.7% and Gemini 3.1 Pro at 54.2%.

On SWE-bench Verified, the curated version of 500 human-validated issues, it goes from 80.8% to 87.6%. On Terminal-Bench 2.0, which measures command-line proficiency, it goes from 65.4% to 69.4%. On GPQA Diamond (scientific reasoning), it reaches 94.2% — practically tied with GPT-5.4 and Gemini 3.1 Pro on a benchmark that’s hitting saturation.

The data from early testers that struck me most: Cursor reports a jump from 58% to 70% on CursorBench. Notion reports +14% on multi-step workflows with a third of the tool errors. Rakuten reports that Opus 4.7 solves 3x more production tasks than Opus 4.6.

Hex, the analytics platform, summarized it well: “Low-effort Opus 4.7 is approximately equivalent to medium-effort Opus 4.6.” Same price, more capability per token.


The Context That Changes Everything: Mythos and Glasswing

This is where this release gets strategically interesting.

On April 7th, Anthropic announced that Claude Mythos Preview can find and exploit software vulnerabilities with a speed and sophistication that rivals the best human security researchers. The response was radical: they didn’t release it to the public. Instead, they created Project Glasswing — a $100 million initiative where only partners like AWS, Apple, Google, Microsoft, CrowdStrike and about 40 additional critical infrastructure organizations can use Mythos Preview to scan and secure their own code.

Opus 4.7 is the first model Anthropic released after that decision. And they say it explicitly: during training, they experimented with efforts to differentially reduce cyber capabilities. The model comes with safeguards that automatically detect and block requests indicating prohibited or high-risk uses in cybersecurity.

This is new. We’re not talking about a disclaimer in the terms of service. We’re talking about a model designed from training to be selectively less capable in a specific domain, while being significantly more capable in everything else.

For those of us who’ve followed AI governance for years, this is exactly what responsible scaling should look like in practice. Not “we self-regulate” as a marketing slogan. A model you don’t release and another you train differently because you learned from the first.


What’s New for Developers

Beyond the benchmarks, there are practical changes worth knowing about:

Xhigh effort level. Opus 4.7 adds a new level between high and max, giving finer control over reasoning depth without the full latency of max. Extended thinking with budget_tokens was eliminated — now it’s adaptive thinking that adjusts automatically.

Vision 3.3x more powerful. Image resolution jumped from 1.15 megapixels to 3.75 megapixels. This isn’t cosmetic — Solve Intelligence reports significant improvements in reading chemical structures and complex technical diagrams. For anyone working with technical documents, blueprints or interfaces, this matters.

Self-verification on long tasks. Opus 4.7 looks for ways to verify its own outputs before reporting results. Devin reports that the model works “coherently for hours” and persists through difficult problems instead of giving up. Warp confirms it solved a concurrency bug that Opus 4.6 couldn’t.

Best-in-class MCP. On MCP-Atlas, Opus 4.7 leads with 77.3%, up from 75.8% for Opus 4.6 and significantly better than GPT-5.4 (68.1%). If you’re building agents with tool-calling, this is the number that matters most.

The tokenizer catch. Opus 4.7 uses an updated tokenizer. The same input can map to between 1.0x and 1.35x more tokens than in Opus 4.6. The price per token hasn’t changed ($5/$25 per million), but the effective cost per request could go up slightly depending on your use case. It’s a detail that matters in production.


What This Means

The frontier model market is at an interesting moment. GPQA Diamond — the scientific reasoning benchmark — is practically saturated. The three top models (Opus 4.7, GPT-5.4, Gemini 3.1 Pro) are within 0.2 points of each other. Real differentiation is migrating to applied capabilities: autonomous coding, tool use, long multi-step tasks.

In that space, Opus 4.7 leads. But Gemini 3.1 Pro is at $2/$12 per million tokens with a 2M context window. If your use case is massive document processing and coding isn’t your priority, the tradeoffs are real. GPT-5.4 leads in computer use (75% on OSWorld) and professional knowledge. There’s no absolute winner — there are winners by use case.

What is new is the precedent that Opus 4.7 sets. It’s the first commercial model from a frontier company that was explicitly trained to be less capable in one domain (offensive cybersecurity) while maximizing capability everywhere else. If this works — and the benchmarks suggest it didn’t compromise anything in coding, reasoning or tool use — it’s a template others will have to follow or explain why they don’t.

For security professionals who do need Opus 4.7’s cyber capabilities for legitimate work (vulnerability research, pentesting, red-teaming), Anthropic created a Cyber Verification Program where they can apply for access.

Opus 4.7 is available today on all Claude products and via API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The model ID is claude-opus-4-7.


Have you tried Opus 4.7 yet? Do you notice the difference with Opus 4.6 in your daily workflow? Let us know in the comments. :speech_balloon: