OpenAI releases GPT-4o - can describe images & video in real-time

Enables interaction and translation of moving material, sound, text, and images without delay.

Calle Rosenqvist Calle Rosenqvist ^{CALLE@KAMERABILD.SE}

Published 14 May 2024 - 09:18

Annons

OpenAI, the company behind ChatGPT, has now introduced an improved version of its AI and language model: GPT-4o, where "o" stands for "omni"). Through GPT-4o, the model can utilize all information at every step of analysis, unlike before where different submodels were used.

The result is more natural and can also combine inputs in the form of video, audio, images, and text to also generate a combination of text, audio, and images with a delay as short as 232 milliseconds and an average of 320 milliseconds - roughly equivalent to a human's reaction time in conversation.

Annons

On the GPT-4o page, you can see more of what the models are capable of, including two GPT-4os interacting with each other, describing the surroundings from a video stream via a mobile phone, and singing together.

news