OpenAI Releases Three New Realtime Voice Models for Developers
OpenAI has launched three realtime voice models focused on reasoning, translation, and transcription. The models are now accessible through the company's Realtime API with specific per-token and per-minute pricing.

The first model, GPT-Realtime-2, introduces GPT-5-class reasoning capabilities to voice conversations. It manages complex requests, maintains natural dialogue flow, calls tools, processes corrections or interruptions, and generates contextually appropriate responses during live exchanges.
GPT-Realtime-Translate provides live speech translation across more than 70 input languages into 13 output languages. The model maintains pace with the speaker to deliver continuous translation during ongoing speech.
GPT-Realtime-Whisper offers low-latency streaming speech-to-text transcription. It converts spoken audio into text in real time, enabling immediate captions and meeting notes that update as conversation progresses.
All three models are available through OpenAI's Realtime API. Developers can access them via the Playground for testing or integrate GPT-Realtime-2 into existing applications using Codex.
Pricing for the models varies by type. GPT-Realtime-2 costs $32 per million audio input tokens, with cached input tokens at $0.40 per million, and $64 per million audio output tokens. GPT-Realtime-Translate is priced at $0.034 per minute, while GPT-Realtime-Whisper is priced at $0.017 per minute.
Additional details on the models and current developer usage are available through OpenAI resources.
Reader-supported
The Circuitry is a passion project I've always wanted to build, and I love the work behind it.
Running it costs real money. APIs, hosting, time. To keep improving the site and growing this into something useful for everyone, those costs have to be covered.
Any contribution is appreciated. If not, no pressure. Thanks for reading.