Artificial intelligence as a service provider AI assembly has a brand new speech recognition model called Universal-1. The company claims that trained on greater than 12.5 million hours of multilingual audio data, it performs well on speech-to-text accuracy in English, Spanish, French and German. It boasts that Universal-1 can reduce hallucinations by 30% for speech data and 90% for ambient noise in comparison with OpenAI’s Whisper Large-v3 model.
In a blog postthe company describes Universal-1 as “another milestone in our mission to provide accurate, faithful and reliable speech-to-text functionality across multiple languages, helping our customers and developers around the world build a variety of Speech AI applications.” In addition to higher understanding the 4 foremost languages, the model can code switch, transcribing multiple languages right into a single audio file.
Universal-1 also supports improved timestamp estimation, which is significant when working with audio and video editing and call evaluation. AI claims the latest model is 13 percent higher than its predecessor, the Conformer-2. The result was improved speaker diarization, an improved minimum permutation concatenated word error rate (cpWER) of 14%, and speaker count estimation accuracy of 71%.
Finally, parallel inference has been improved, reducing the processing time of long audio files. Universal-1 is claimed to perform this task five times faster than Whisper Large-v3. Assembly AI compared the processing speed of Universal-1 to Whisper Large-3 on Nvidia Tesla T4 machines with 16 GB of VRAM. With a batch size of 64, transcribing 1 hour of audio in the first case took 21 seconds. However, using a much smaller batch of 24, it took the latter 107 seconds to finish the same task.
The advantages of getting improved speech-to-text AI models are that notetakers can generate more accurate, hallucination-free notes, discover motion items, and organize metadata comparable to proper nouns, who’s speaking, and time information. Additionally, it is going to help construct utility applications that include AI-powered video editing workflows, telehealth platforms, automated clinical note entry, and claims processes where accuracy is significant, and more.
The Universal-1 model is available via the Assembly AI API.
Credit : venturebeat.com