OpenAI da vida a la IA conversacional con gpt-realtime y la API Realtime

FROM THE FRONTIER

630xauto
See OpenAI’s real-time voice features in action. Source: OpenAI

After months in beta, OpenAI is bringing out its advanced gpt-realtime voice model and Realtime API that can shift emotions, change tones, and even switch languages mid-sentence.

More than just words. Unlike older setups that stitched together separate models for speech and text, gpt-realtime runs everything through a single model, which reduces lag and improves flow. The API also comes with MCP server support, image input, and SIP phone calling support, which gives developers more tools to build voice agents that sound natural and respond quickly.

Some early use cases include a nutrition and fitness coaching app that uses the API to enable voice conversations with its AI coach, and a language learning app that leverages the API to enable a role-play feature for users to practice conversations.

You can learn more about the pricing and technical details of the Realtime API here.