Open AI Updated our realtime API today, which is currently in beta. This update adds new voices to its platform for speech-to-speech applications and reduces costs associated with caching prompts.
Beta users of the Realtime API will now have five new voices that they can use to build their applications. OpenAI revealed three new sounds, Ash, Verse and the British-sounding Ballad, in a post on X.
Two real-time API updates:
– Now you can create speech-to-speech experiences with five new voices—more expressive and playable. ???
– We are reducing the cost by using prompt caching. Cached text inputs are discounted by 50% and cached audio inputs are discounted… pic.twitter.com/jLzZDBrR7l
— OpenAI Developers (@OpenAIDevs) 30 October 2024
The company said in its statement API documentation that the native-speech characteristic of speech “drop.[s] An intermediate text format means low latency and nuanced output.
However, OpenAI cautions that it can no longer offer client-side authentication for the API because it is still in beta. It also said that there could be problems with processing real-time audio.
“Network conditions greatly affect real-time audio, and when network conditions are unpredictable it is difficult to reliably deliver audio from a large client to a server,” the company shared. .
OpenAI’s history with AI-powered speech and sounds has been controversial. In March, it released Voice Engine, a voice-cloning platform for competitors. Eleven labsBut its reach was limited to only a few researchers. In May, after the company demoed its GPT-4o and voice mode, it stopped using a voice, Skye, after actress Scarlett Johansson commented on its similarity to her voice.
The company in September introduced ChatGPT Advanced Voice mode for paying customers (who use ChatGPT Plus, Enterprise, Teams and Edu) in the US.
Speech-to-speech AI will ideally allow businesses to generate more real-time responses using voice. Suppose a customer calls a company’s customer service platform. In this case, the speech-to-speech capability can pick up the person’s voice, understand what they’re asking, and respond using an AI-generated voice with less latency. Speech-to-Speech also allows users to create voiceovers, in which the user speaks their own lines, but the voice output is not theirs. A platform that offers this. Copy And, of course, ElevenLabs.
OpenAI released a real-time API during its Dev Day this month. The purpose of the API is to speed up the building of voice assistants.
Reducing costs
Using speech-to-speech features, though, can be expensive.
When the Realtime API launched, the pricing structure was $0.06 per minute of audio input and $0.24 per audio output, which isn’t cheap. However, the company plans to lower real-time API costs with instant caching.
Cached text inputs will be discounted by 50%, and cached audio inputs will be discounted by 80%.
OpenAI also announced prompt caching during the dev day and will keep frequently requested context and prompts in the model’s memory. This will reduce the number of tokens it needs to generate to generate responses. Lowering input costs can encourage more interested developers to connect to the API.
OpenAI isn’t the only company to roll out prompt caching. Bushra Launched instant caching for Claude 3.5 Sonnet in August.
Credit : venturebeat.com