OpenAI has unveiled a new voice assistant feature capable of reading facial expressions and translating spoken language in real-time, reminiscent of the AI from the film “Her.” During a livestream demonstration on Monday, OpenAI engineers and CTO Mira Murati showcased the assistant’s new capabilities. They prompted the assistant to be more expressive while telling a bedtime story, switched it to a robotic voice, and concluded with a singing voice. The assistant also demonstrated the ability to interpret what the phone’s camera was seeing and respond to on-screen visuals, as well as act as a translator without continuous prompting.
The voice response of the assistant closely resembled Scarlett Johansson’s character in “Her,” where an AI assistant forms a relationship with a human. Following the event, OpenAI CEO Sam Altman cryptically posted “her” on X, highlighting his admiration for the movie. Murati clarified in a briefing with The Verge that the assistant was not intentionally designed to mimic Johansson, emphasizing, “Someone asked me in the audience this exact same question, and then she said, ‘Ah, maybe the reason I didn’t recognize it from ChatGPT is because the voice has so much personality and tonality.’”
These enhancements mark a significant upgrade over ChatGPT’s current voice mode, which offers limited interaction and cannot be interrupted or respond to visual inputs. The new features will be launched in a limited “alpha” release in the coming weeks and will first be available to ChatGPT Plus subscribers before a broader rollout.
This announcement follows a Bloomberg report suggesting OpenAI is close to a deal with Apple to integrate ChatGPT into the iPhone, potentially offering a more reliable alternative to Siri. When questioned about the partnerships, Murati said, “We haven’t talked about any of the partnerships.”
OpenAI CEO Sam Altman expressed his enthusiasm in a blog post after the livestream, stating, “The new voice (and video) mode is the best computer interface I’ve ever used. It feels like AI from the movies; and it’s still a bit surprising to me that it’s real. Getting to human-level response times and expressiveness turns out to be a big change.”
These advancements not only enhance the functionality of AI assistants but also bring them closer to human-like interactions, potentially transforming how users engage with technology.