New language model from OpenAI GPT-4o

15 May 2024

Introduction

For the past few years, the topic of artificial intelligence has been on our minds. We keep being amazed by its capabilities: creating unique works, solving complex problems, in-depth analysis, and much more. All of this is made possible by language models. They allow computers to understand and process natural human speech.

white robot wallpaper

Photo by Possessed Photography on Unsplash

The development of language models has been impressive. It started back in 1966 (that's right, even then scientists were discussing the concept of artificial intelligence) with models like ELIZA. In 2024, OpenAI presented us with its new GPT-4o language model, which shocked the world with its capabilities.

For more information on the history of artificial intelligence, see our scribe ‘A Brief History of AI’.

New product - GPT-4o

OpenAI's GPT-4o is the latest masterpiece in the world of language models, setting new standards in text understanding and processing. But what makes it so special?

First of all, the GPT-4o has an improved architecture that allows it to process your queries with extreme accuracy and speed. This means that you will receive more precise and correct answers to your questions. But that's not all: its understanding of human language (not just context, but also mood and emotion) has been taken to a new, higher level. Try to speak to the new language model and you will be amazed!

It is important to note that GPT-4o has information until September 2023, i.e. until its last training.

computer coding screengrab

Photo by Markus Spiske on Unsplash

New capabilities

GPT-4o has advanced functionality. Its demonstration made an incredible impression, as it creates new scenarios of interaction with artificial intelligence that can be actively used in our everyday lives.

Image analysis

One of the key new features of the GPT-4o is its ability to analyse and interpret images. The model is able to recognise the images you show it in real time and provide a detailed description of what it sees. This means that you can upload a photo or snapshot and the GPT-4o will instantly tell you what's in the image, identify key objects, colours, emotions of the people in the photo, and much more.

However, the GPT-4o's capabilities are not limited to static images. It can also analyse video in real time. It is able to analyse what is seen on the video, identify actions that are taking place, recognise faces, objects, and even determine the mood and emotions of people on the screen. This opens up new possibilities for using the model in various industries, including security, medicine, entertainment, and many other areas where fast and accurate analysis of video content is required.

Demonstration of image analysis and live speech - link to the demonstration.

Content analysis by link

The GPT-4o also has the ability to analyse text and multimedia data available from a link. This means that you can provide a URL to the model and it is able to extract information from the web page and provide you with a detailed content analysis. This feature is extremely useful for quickly retrieving information from the web without having to view all the content. For example, the model can review a news article and provide you with the main points, or analyse a YouTube video and highlight the key points.

File analysis

The new model also provides improved file analysis capabilities across multiple file formats. Users can upload documents, spreadsheets, and other file types for detailed analysis and extract useful information directly from their contents.

stack of books on table

Photo by Wesley Tingey on Unsplash

Improvements

OpenAI's GPT-4o is the new leader in speech models, delivering significant improvements in speed, efficiency and functionality. The model is capable of processing audio queries in 232 milliseconds, which is close to the average human reaction time in a conversation.
The GPT-4o significantly outperforms previous models in audio recognition and translation, as well as image and video understanding. It has set new records in multi-language and visual tests.

Availability

OpenAI's GPT-4o is available through several channels. Users with a ChatGPT Plus subscription will have full access to the model's new features. The model is also available via the API for developers, which allows integrating GPT-4o into various applications and services.

Users who do not have a ChatGPT Plus subscription will have limited access to GPT-4o. They will be able to test the new features of the new model, but with certain restrictions. This policy will allow all users to experience the benefits of the new model.

a cell phone sitting next to a green leaf

Photo by Solen Feyissa on Unsplash

Conclusions

The rapid development of artificial intelligence technologies, including language models, is opening up new horizons for humanity. OpenAI's GPT-4o demonstrates how far we have come in understanding and processing text, images, and video, providing unprecedented opportunities for various industries.
However, we should not forget about responsibility. It's important to use these powerful tools in an ethical and legal manner, considering potential risks and ensuring that data is protected.