AI & generative AI
Photo: OpenAI
OpenAI releases GPT-4o - can describe images & video in real-time
Enables interaction and translation of moving material, sound, text, and images without delay.
OpenAI, the company behind ChatGPT, has now introduced an improved version of its AI and language model: GPT-4o, where "o" stands for "omni"). Through GPT-4o, the model can utilize all information at every step of analysis, unlike before where different submodels were used.
The result is more natural and can also combine inputs in the form of video, audio, images, and text to also generate a combination of text, audio, and images with a delay as short as 232 milliseconds and an average of 320 milliseconds - roughly equivalent to a human's reaction time in conversation.
On the GPT-4o page, you can see more of what the models are capable of, including two GPT-4os interacting with each other, describing the surroundings from a video stream via a mobile phone, and singing together.