La historia completa sobre GPT-5 de OpenAI

|

|
|----|

There were some early wins. Users were pleasantly surprised to learn that GPT-5’s base model is rolling out to unpaid ChatGPT users for free, while Plus subscribers ($20/month) get higher usage limits. For developers, API pricing starts at an attractive $0.15/$1.50 per million input/output tokens for GPT-5 Nano, and up to $1.25/$10 for the full model.

It has the benchmarks to back up its performance. Straight off the bat, the model aced LMArena. Millions of users unknowingly put it through its paces and the results were definitive: GPT-5 ranked #1 across all major categories. It also snagged 100% on AIME 2025 math problems (when using Python), 74.9% on SWE-Bench coding challenges, and 42% on Humanity’s Last Exam (slightly behind Grok 4 Heavy).

The jury’s still out on GPT-5’s coding promise. In live demos, the model built a complete language learning game from a simple prompt in minutes. Users have been taking it for a spin, with one claiming they solved some “very, very hard debugging prompts that were previously unsolved (by AI)”. Others have said they’d rather stick with Claude for their coding needs.

The model zeroes in on health applications: GPT-5 performed well on OpenAI’s HealthBench evals, scoring higher than any previous model. Justine Moore, partner at VC firm a16z, is “pleasantly surprised” by the company leaning into the fact that millions of users use ChatGPT to understand and treat diagnoses. “With GPT-5, it’s no longer an “off-label” application.”

But don’t call it AGI yet. GPT-5 still trails Grok 4 on the ARC-AGI-2 benchmark, coming in at just 9.9% to Grok 4’s 15.9%, as Elon Musk pointed out on X. After months of being billed as “a significant step on the path to AGI", the model left some observers underwhelmed — OpenAI’s odds of having the best AI model by the end of August tanked from over 70% to under 20% on betting platform Polymarket after GPT-5 was announced.

Our early verdict: This might not be AGI, but that doesn’t mean it’s not a major step forward. We’re seeing significant improvements on tasks like writing and coding in our early tests, and we expect you’ll get good value out of the upgrade. We’ll report on new use cases and capabilities as we uncover them in our testing over the coming days.