OpenAI releases GPT-4o , its latest advancement in artificial intelligence, designed to provide real-time responses and with the unique ability to understand both audio and video . This new model represents a milestone in human-machine interaction, offering a more natural experience and accepting a variety of media as input.
According to OpenAI CEO Sam Altman, GPT-4o is “natively multimodal,” meaning it can generate content or interpret commands from voice, text, and images. Developers interested in exploring GPT-4o’s capabilities will have access to its API, which is offered at half the price and twice the speed of GPT-4-turbo, Altman noted during his speech at the X conference.
This new model of Artificial Intelligence has reached a remarkable level of perfection, emulating the experience of interacting with another human being in an astonishingly accurate way. Among its many advantages, it stands out for being free, while version 4.0 will continue to be paid for.
Regarding its availability, the company has announced: "GPT-4o's text and image capabilities are rolling out to ChatGPT starting today . We're making GPT-4o available for free to users, with up to 5x higher message limits for Plus users. We'll be releasing a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus soon."
ChatGPT can now also translate conversations in a similar way to how a human translator would , facilitating communication and eliminating language barriers.
As if that weren't enough, simply by uploading a photo or image, this model provides precise details about it, far exceeding previous expectations. With just a mobile phone, this OpenAI advance can analyze anything our camera captures.
Availability starts today, although the rollout may not be uniform, so some users may have to wait a few days to enjoy the new possibilities offered by GPT-4o. In addition, the voice mode version with GPT-4o will be available in the coming weeks, although specific details have not yet been provided.