AI voice features are coming to ChatGPT and Spotify
The ChatGPT and Spotify platforms are getting AI-based enhancements that are actually AI voice features. Namely, the popular chatbot of the company OpenAI ChatGPT will now be able to receive voice commands, as well as queries based on images. On the other hand, Spotify, a platform for listening to music and podcasts that uses OpenAI technology, will be able to translate podcasts into other languages using the voices of the hosts of these shows.
In the case of the popular ChatGPT bot, the updates will allow it to conduct voice conversations with app users on Android and iOS operating systems, while image uploading will be possible on all platforms. Initially, these options will only be available to those who pay Plus and Enterprise subscriptions, while “regular” users will also get image-based features a little later.
If you are one of the ChatGPT subscribers, to enable the new features you need to enable voice chats in this application. You can do this by going to settings and then to “New Features”. By touching the microphone button, you will be able to choose from five offered voices in which the artificial intelligence will talk to you.
OpenAI says voice conversations are powered by a new text-to-speech model that can generate “a human-like sound from just text or a few seconds of speech samples.” This startup thus created five voices with the help of professional actors it hired, while on the other hand, the company’s speech recognition system Whisper converts the user’s spoken words into text.
The image-based functions are, as it seems from the description, also in one hand astonishing. OpenAI claims that, for example, you can show the chatbot a photo of your grill and ask it why it’s not working, or a picture of the foods you have in the fridge, based on which it will plan meals that you can prepare from them. Of course, there are also those standard math functions where you can picture a math problem that it should then solve.
The models OpenAI uses for image recognition functions are GPT-3.5 and GPT-4. To launch these options, you need to tap the photo icon, followed by the “plus” button on the Android or iOS apps, to take a photo or select an existing image on your device. If you want to focus on a certain part of the image, you can also use the black tool.
On its site, OpenAI noted that it has limited how ChatGPT can analyze and make direct statements about the people appearing in the images given that “the bot is not always accurate and these systems should respect the privacy of individuals.”
It should also be noted that the bot is more efficient in understanding English text on images compared to other world languages. The company itself admits that ChatGPT “does not work” in other languages, especially when it comes to those that do not use non-Roman scripts, so it suggests that they avoid the new feature for now.
Spotify is also getting AI voice functions
Meanwhile, the Spotify platform has teamed up with the OpenAI company to use its AI voice technology for a rather interesting purpose. This platform, which hosts numerous podcasts in addition to music, has announced a pilot tool called Voice Translation for podcasters.
Namely, this feature will be able to translate podcasts into different languages using the voices of the people who appear on the show.
Spotify says the tool will be able to retain the speech characteristics of the original speakers after converting their voice into other languages. Initially, Spotify will only translate select English shows into a few select languages. Spanish versions of some episodes of Armchair Expert and The Diary of a CEO with Steven Bartlett are available, and French and German versions should be available soon.