Everyone’s favorite chatbot now can. Look and listen and speak.. On Monday, OpenAI announced recent multimodal capabilities for ChatGPT. Users can now have voice conversations or share photos in real time with ChatGPT.
Audio and multimodal features have change into the following step within the fierce creative AI competition. Meta recently launched AudioCraft to generate music with AI, and each Google Bard and Microsoft Bing have deployed multimodal features for his or her chat experiences. Just last week, Amazon previewed a new edition of Alexa that might be powered by its own LLM (large language model), and Apple even experimented with AI-generated voice, with Personal Voice. doing.
Voice capabilities can be available on iOS and Android. Like Alexa or Siri, you may tap to talk to ChatGPT and it should speak to you in one in all five preferred voice options. Unlike existing voice assistants, ChatGPT is powered by more advanced LLMs, so what you will hear is the conversation and creative responses that OpenAI’s GPT-4 and GPT-3.5 are capable of making with text. An example OpenAI shared within the announcement is generating a bedtime story with voice prompts. So, drained parents at the top of a protracted day can outsource their creativity to ChatGPT.
The tweet could have been deleted.
Multimodal recognition is something that has been predicted for some time, and is now rolling out to ChatGPT in a user-friendly way. When GPT-4 was released last March, OpenAI demonstrated its ability to grasp and interpret images and handwritten text. It will now be a part of on a regular basis ChatGPT usage. Users can upload a photograph of an item and ask ChatGPT about it — identifying a cloud, or planning a meal based on a photograph of the contents of your fridge. Multimodal can be available on all platforms.
As with any creative AI development, there are serious problems with ethics and privacy to contemplate. To mitigate the risks of audio deepfakes, OpenAI says it is just using its audio recognition technology for the particular “voice chat” use case. In addition, it was created with voice actors who’ve “worked live.” That said, the announcement doesn’t mention whether users’ voices could be used to coach the model, if you go for voice chat. As for ChatGPT’s multimodal capabilities, OpenAI states that “it has taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people because ChatGPT is not always accurate.” And these systems must respect individuals’ privacy.” But the true test of sinister use won’t be known until it’s released into the wild.
Voice chat and photos are coming in the following two weeks for ChatGPT Plus and Enterprise users, and “soon later” for all users.
Credit : mashable.com