Key Points:
- OpenAI introduced GPT-Realtime-2, offering a massive 128,000-token memory and GPT-5-class reasoning for voice interactions.
- The new live translation tool instantly supports 70 input languages and 13 output languages, with no awkward pauses.
- Zillow saw call success rates jump from 69% to 95% during early testing of the new technology.
- The application includes active safety blockers and full support for strict European data privacy laws.
OpenAI just revealed three brand new audio models built specifically for real-time voice applications. The technology giant shared the news in a recent press release, detailing major upgrades to how artificial intelligence handles spoken conversations. Developers can now access GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper to build smarter voice assistants. These new tools help make talking to computers feel completely natural and eliminate frustrating audio delays.
The flagship model, GPT-Realtime-2, brings huge computing power to the table. OpenAI states this model features GPT-5-class reasoning capabilities specifically tuned for voice interactions. The developers also heavily expanded the memory capacity for this version. The model now has a context window of 128,000 tokens, up massively from the previous limit of just 32,000. This upgrade means the AI can remember incredibly long conversations without losing track of what users said an hour ago.
Users can also decide exactly how much brainpower the AI uses to think about its answers. The system provides adjustable reasoning levels that range from minimal effort for quick chats to extra-high processing for complex problems. The model backs up these new features with solid test scores. On standard audio evaluation benchmarks, GPT-Realtime-2 scored exactly 15.2% higher on the Big Bench Audio test when developers compared it directly with the older GPT-Realtime-1.5.
Several major companies got early access to test the technology and reported incredible results. The real estate website Zillow integrated GPT-Realtime-2 into its call systems and saw an immediate difference in performance. Zillow reported a huge 26-point improvement in its overall call success rates. The company hit a 95% success rate with the new AI, leaving the 69% success rate of previous models far behind in the dust.
The second major release, GPT-Realtime-Translate, solves the difficult problem of live language translation. This model takes spoken words in one language and instantly speaks them back in another. It currently supports more than 70 different input languages and outputs speech in 13 different languages. Most impressively, the software translates speech in real time while keeping pace with the original speaker. This totally removes the awkward silent pauses that usually happen when using digital translators.
BolnaAI tested the translation features heavily and found the new system highly accurate. The company focused its testing on regional dialects and complex languages. BolnaAI reported a 12.5% drop in word error rates when testing the system across Hindi, Tamil, and Telugu languages. Finding accurate voice technology for these specific languages often challenges developers, making this double-digit improvement a huge win for international business communication.
The third release focuses strictly on instantly converting spoken words into written text. OpenAI calls this model GPT-Realtime-Whisper. It delivers smooth, streaming speech-to-text transcription for developers who need to capture live conversations. The software converts spoken words into highly accurate text, exactly as people speak them out loud. This tool helps businesses create instant captions for live events or generate quick text transcripts for important voice calls.
Developers access all three of these new tools directly through the OpenAI Realtime API. The company structured the pricing differently depending on the specific model developers choose to use. For the flagship GPT-Realtime-2 model, OpenAI charges $32 per 1 million audio input tokens processed. When the AI responds to the user, the cost increases to $64 per 1,000,000 audio output tokens.
The company offers a much simpler pricing structure for the other two models. Developers pay for the translation and transcription tools by the minute rather than counting tokens. The GPT-Realtime-Translate model costs exactly $0.034 per minute of use. Meanwhile, the streaming GPT-Realtime-Whisper model costs a very affordable $0.017 per minute. This minute-based pricing makes it much easier for small businesses to calculate their exact daily computing costs.
OpenAI also included strict safety and privacy measures inside the new application programming interface. The company stated that the service relies on active classifiers that constantly monitor every single conversation. These classifiers operate in the background and instantly block any interactions that violate the official content guidelines. The new system also fully supports European Union data residency requirements, meaning European companies can use the AI without breaking strict local privacy laws.
These three new models give software engineers powerful tools to build the next generation of voice applications. Companies can now create customer service bots that actually sound human, understand complex questions, and speak multiple languages without missing a beat. OpenAI continues to push the limits of artificial intelligence, forcing the entire technology industry to adapt to these rapid advancements in voice computing.