DeepSeek V4 Pro dropped 75% and there's no going back: what every CTO needs to recalculate

On May 22, DeepSeek announced something that doesn’t happen often in enterprise software: a promotional discount became the permanent list price. What was presented as a limited offer until May 31 is now, simply, what DeepSeek V4 Pro costs — $0.435 per million input tokens and $0.87 per million output tokens. Previously they were $1.74 and $3.48, respectively.

This is not a promotion. It’s a signal of repricing.


The numbers, unadorned

I’m putting the table on the table, because the gap is wide enough that burying it in prose would be dishonest:

Model Input ($/M tokens) Output ($/M tokens)
DeepSeek V4 Pro $0.435 $0.87
DeepSeek V4 Flash $0.14 $0.28
Claude Opus 4.7 $15.00 $75.00
GPT-5.5 $5.00 $30.00

The comparison with Opus 4.7 is what’s going to dominate planning meetings at companies. At $15/$75, Opus 4.7 costs approximately 34× more on input and 86× more on output than V4 Pro. Against GPT-5.5, DeepSeek is about 11.5× cheaper on input and 34× cheaper on output.

There’s also a cache hit price that deserves its own line: $0.003625/M tokens — roughly 1/120 of the non-cached input price. For any agent that rereads a system prompt or long context on each turn, this number changes the math dramatically. Anthropic’s cache hit price is 1/10 of input by comparison — a much smaller multiplier.

Concrete example: a coding agent that processes 1 billion tokens per month (800M of input cache miss + 200M of output) costs approximately $522/month with DeepSeek V4 Pro. The same workload on Claude Opus 4.7 runs around $9,000. On GPT-5.5, around $10,000.


What model is this, really?

DeepSeek V4 Pro is not an economy-tier model with frontier marketing. It’s a Mixture of Experts architecture with 1.6 trillion parameters and 49 billion parameters activated per forward pass, with a context window of 1 million tokens. Its benchmark results place it at 93.5 on LiveCodeBench and 80.6 on SWE-Verified — the coding benchmarks that practitioners actually use to make decisions.

The weights are under MIT license. You can self-host if you have the infrastructure — though it requires serious hardware, and at current API pricing levels, that calculation is no longer as straightforward as it once was.

One infrastructure detail worth mentioning: V4 Pro was optimized to run on Huawei Ascend accelerators, not exclusively on Nvidia hardware. This is, reportedly, one of the factors that gives DeepSeek the confidence to sustain low prices permanently as Huawei’s Ascend 950PR units scale during 2026.


Why “permanent” matters more than “75% discount”

A promotional discount is a pricing experiment. A permanent repricing is a strategic signal.

DeepSeek isn’t looking to recover margin later. It’s establishing a new baseline — one that makes the cost structures of closed-source frontier models increasingly difficult to justify at scale. The model is open-weight, the price is now fixed, and the performance in benchmarks is in the range where engineering teams can make serious evaluations.

For CTOs building AI-native products, the calculation changes: the question is no longer “can we afford frontier models at scale?” but “what exactly are we buying when we pay 30× more?”

This doesn’t mean closed-source models have nothing to offer. The advantages of Anthropic’s ecosystem — tool use, Artifacts, safety tuning, the Claude platform — remain relevant for specific workloads. But at the infrastructure level, at the cost-per-token level in high-volume processing pipelines, the gap is wide enough that doing nothing is also a decision.


What I’d evaluate right now

If I were reviewing an AI technical roadmap this week, here’s what I’d put on the agenda:

1. Audit your token consumption for your 3 most important agentic workflows. Most teams don’t know their actual token consumption per component. Instrument it. The cache hit rate on system prompts alone usually surprises people.

2. Run a benchmark against your own evals, not against industry benchmarks. LiveCodeBench matters, but your specific use case matters more. Run your evaluation suite against V4 Pro on DeepInfra or Together.ai before drawing conclusions.

3. Separate “frontier model” from “closed-source model”. V4 Pro is frontier-grade and open-weight. Before, those two things were mutually exclusive. They’re not anymore.

4. Evaluate geopolitical risk honestly. DeepSeek is a Chinese lab. For some regulated industries, that’s a blocker regardless of price. For many enterprise use cases, it’s not. Know which category you’re in before running the benchmark.

The repricing is already done. Whether you act on it is now a strategic decision, not a budget constraint.


Sources: Engadget · The Decoder · DeepInfra · TokenMix

1 Like