GPT-4o: New OpenAI flagship model on Azure AI

These days, GPT-4o's text and picture capabilities are starting to be expanded by ChatGPT. GPT-4o will be available for free through OpenAI, and Plus subscribers will get communication limitations that are up to five times higher. ChatGPT Plus is going to release a beta version of a new Voice Mode that incorporates GPT-4o in the coming weeks.

OpenAI's most recent deep learning scaling milestone is GPT-4. Text and image inputs are handled by the massive multimodal model GPT-4, which also produces text outputs. Although it can't match human performance in many real-world scenarios, it can match human performance on professional and academic criteria. In comparison to GPT-3.5, it ranks in the top 10% of simulated bar exam participants. Following a half-year period of gradually aligning GPT-4 with the help of ChatGPT and our adversarial testing programme, OpenAI attained their highest-ever outcomes in terms of factuality, steerability, and guardrail refusal.

For their workload, OpenAI co-designed a supercomputer with Azure and improved their deep learning stack over a two-year period. OpenAI trained GPT-3.5 for the system's first "test run" in the previous year. Their theoretical foundations were strengthened and certain faults were fixed. As a result, OpenAI's GPT-4 training run was remarkably stable, making it the company's first large model whose training outcome it could predict with precision. With a focus on stable scalability, OpenAI aims to advance our method of anticipating and making plans for future capabilities beforehand, which is critical for security.

Text input for GPT-4 will soon be available via ChatGPT and the API (waiting).OpenAI is collaborating with one partner to expand the user base for picture input. In order to help us improve, OpenAI is also making OpenAI Evals, their platform for autonomous AI model performance review, open-source. Anyone can report issues with the models.

Ability

GPT-4o, (which stands for "omni") is a step towards significantly more natural human-computer interaction since it can accept any combination of text, audio, and image as input and output any combination of text, audio, and image. When responding to auditory inputs, it can respond up to 320 milliseconds on average, which is similar to how quickly a human responds to questions during a conversation (opens in a new window). It maintains performance on English and code text while matching GPT-4 Turbo speed on non-English text. It is 50% less expensive and much faster in the API. It performs better than previous models, especially in visual and audio understanding.

With an average delay of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) prior to GPT-4o, you could converse with ChatGPT via Voice Mode. In order to accomplish this, Voice Mode uses a pipeline consisting of three separate models: GPT-3.5 or GPT-4 converts input text to text, a third basic model converts text back to audio, and a basic model converts audio to text. This procedure results in significant information loss for GPT-4, the key source of intelligence. Tone, several speakers, background noise, laughter, and the display of emotions are all beyond its direct perception.

Through its use, OpenAI trained an end-to-end single new model for text, vision, and audio—that is, all inputs and outputs are handled by the same neural network. OpenAI has just started to investigate the possibilities and limitations of GPT-4o since it is their first model to include all of these modalities.

Assessments of models

It achieves GPT-4 Turbo-level performance in text, reasoning, and coding intelligence, while exceeding prior benchmarks in multilingual, auditory, and visual skills.

Language tokenization

To demonstrate how the new tokenizer reduces data across different language families, these 20 languages were chosen.

Gujarati 4.4x fewer tokens (from 145 to 33)

હેલો, મારું નામ જીપીટી-4o છે. હું એક નવા પ્રકારનું ભાષા મોડલ છું. તમને મળીને સારું લાગ્યું!

Telugu 3.5x fewer tokens (from 159 to 45)

నమస్కారము, నా పేరు జీపీటీ-4o. నేను ఒక్క కొత్త రకమైన భాషా మోడల్ ని. మిమ్మల్ని కలిసినందుకు సంతోషం!

Tamil 3.3x fewer tokens (from 116 to 35)

வணக்கம், என் பெயர் ஜிபிடி-4o. நான் ஒரு புதிய வகை மொழி மாடல். உங்களை சந்தித்ததில் மகிழ்ச்சி!

Marathi 2.9x fewer tokens (from 96 to 33)

नमस्कार, माझे नाव जीपीटी-4o आहे| मी एक नवीन प्रकारची भाषा मॉडेल आहे| तुम्हाला भेटून आनंद झाला!

Hindi 2.9x fewer tokens (from 90 to 31)

नमस्ते, मेरा नाम जीपीटी-4o है। मैं एक नए प्रकार का भाषा मॉडल हूँ। आपसे मिलकर अच्छा लगा!

Urdu 2.5x fewer tokens (from 82 to 33)

ہیلو، میرا نام جی پی ٹی-4o ہے۔ میں ایک نئے قسم کا زبان ماڈل ہوں، آپ سے مل کر اچھا لگا!

Arabic 2.0x fewer tokens (from 53 to 26)

مرحبًا، اسمي جي بي تي-4o. أنا نوع جديد من نموذج اللغة، سررت بلقائك!

Persian 1.9x fewer tokens (from 61 to 32)

سلام، اسم من جی پی تی-۴او است. من یک نوع جدیدی از مدل زبانی هستم، از ملاقات شما خوشبختم!

Russian 1.7x fewer tokens (from 39 to 23)

Привет, меня зовут GPT-4o. Я — новая языковая модель, приятно познакомиться!

Korean 1.7x fewer tokens (from 45 to 27)

안녕하세요, 제 이름은 GPT-4o입니다. 저는 새로운 유형의 언어 모델입니다, 만나서 반갑습니다!

Vietnamese 1.5x fewer tokens (from 46 to 30)

Xin chào, tên tôi là GPT-4o. Tôi là một loại mô hình ngôn ngữ mới, rất vui được gặp bạn!

Chinese 1.4x fewer tokens (from 34 to 24)

你好，我的名字是GPT-4o。我是一种新型的语言模型，很高兴见到你!

Japanese 1.4x fewer tokens (from 37 to 26)

こんにちわ、私の名前はGPT−４oです。私は新しいタイプの言語モデルです、初めまして

Turkish 1.3x fewer tokens (from 39 to 30)

Merhaba, benim adım GPT-4o. Ben yeni bir dil modeli türüyüm, tanıştığımıza memnun oldum!

Italian 1.2x fewer tokens (from 34 to 28)

Ciao, mi chiamo GPT-4o. Sono un nuovo tipo di modello linguistico, è un piacere conoscerti!

German 1.2x fewer tokens (from 34 to 29)

Hallo, mein Name is GPT-4o. Ich bin ein neues KI-Sprachmodell. Es ist schön, dich kennenzulernen.

Spanish 1.1x fewer tokens (from 29 to 26)

Hola, me llamo GPT-4o. Soy un nuevo tipo de modelo de lenguaje, ¡es un placer conocerte!

Portuguese 1.1x fewer tokens (from 30 to 27)

Olá, meu nome é GPT-4o. Sou um novo tipo de modelo de linguagem, é um prazer conhecê-lo!

French 1.1x fewer tokens (from 31 to 28)

Bonjour, je m’appelle GPT-4o. Je suis un nouveau type de modèle de langage, c’est un plaisir de vous rencontrer!

English 1.1x fewer tokens (from 27 to 24)

Hello, my name is GPT-4o. I’m a new type of language model, it’s nice to meet you!

The model's accessibility

GPT-4o is OpenAI's latest attempt to advance deep learning capabilities, this time for practical usage in real-world scenarios. They have worked very hard to improve efficiency at every stack tier during the last two years. As a first result of this work, OpenAI is able to offer a GPT-4 level model to a much larger audience. The features of GPT-4o will be provided iteratively (with improved red team access starting right away).

Developers can use GPT-4o for text and vision thanks to the API. GPT-4o is half the price, two times faster, and has rate constraints five times greater than GPT-4 Turbo. OpenAI plans to provide a limited number of trustworthy partners with API access to enable GPT-4o's expanded audio and video capabilities in the coming weeks.

With GPT-4o, OpenAI the company behind ChatGPT has developed enormous language models. It is distinguished by its multimodal processing and responsiveness to text, images, and audio. The following are the key features of GPT-4o:

Crucial elements:

The most significant aspect of GPT-4o is its multimodality. It has the ability to process and respond to text, images, and sounds. You could offer it a picture and ask it to write a poem about it, or you could give it an audio clip and ask it to describe the conversation.

Improved performance: OpenAI claims that GPT-4o outperforms its predecessors in several areas, including as text generation, audio processing, image recognition, and complex text interpretation.
Safety and limitations:

Emphasis on safety: OpenAI prioritises safety by scrutinising training data and implementing safety protocols. Furthermore, risk evaluations and external testing have been conducted to identify any potential issues such as bias or manipulation.

Restricted distribution: OpenAI's API is currently required to access GPT-4o's text and image input/output functions. A follow-up version with audio support might happen.

Concerns

Specific skills: It's unclear how much multimodal reasoning or challenging audio problems are truly beyond the capabilities of GPT-4o.

Long-term effects: It's too soon to predict what applications or drawbacks GPT-4o might have.

Microsoft is pleased to announce the launch of GPT-4o, OpenAI's newest flagship model, on Azure AI. By fusing text, visual, and audio capabilities, this cutting-edge multimodal model raises the bar for conversational and creative AI experiences. GPT-4o handles images and text and is presently available for preview in the Azure OpenAI Service.

A breakthrough for the generative AI offered by Azure OpenAI Service

GPT-4o offers a shift in the way AI models interact with multimodal inputs. Text, visuals, and music are all seamlessly integrated into GPT-4o to create a more dynamic and engaging user experience.

Key elements of the launch: Easy access and what to expect

Azure OpenAI Service customers can now use a preview playground in Azure OpenAI Studio to explore the enormous potential of GPT-4o in two US locations. This initial iteration of the concept demonstrates its potential by emphasising text and visual inputs, paving the way for the incorporation of audio and video functionalities.

Efficiency and cost-effectiveness

The GPT-4o was created with speed and efficiency in mind. Its sophisticated ability to handle complex queries with fewer resources can lead to lower costs and better performance.

Applications that could be studied with GPT-4o

There are numerous advantages for businesses in a variety of industries when implementing GPT-4o:

Better customer service: By combining many data sources, GPT-4o enables more comprehensive and dynamic customer help discussions.
Advanced analytics: Take advantage of GPT-4o's ability to manage and analyse many types of data to enhance decision-making and uncover deeper insights.
Content innovation: Make use of GPT-4o's producing capabilities to create engaging and diverse content forms that appeal to a broad spectrum of consumer tastes.

Anticipate future developments: GPT-4o at Microsoft Build 2024

At Microsoft Build 2024, Azure is eager to provide further details about GPT-4o and other Azure AI advances to help developers fully realise the potential of generative AI.

Start by utilising the Azure OpenAI Service

To begin using GPT-4o with Azure OpenAI Service, take the following steps:

Check out GPT-4o in the Azure OpenAI Service Chat Playground preview.
Fill out this form to request access to Azure OpenAI Services if you don't already have it.
Learn more about the latest enhancements made to the Azure OpenAI Service.
With Azure AI Content Safety, find out more about Azure's ethical AI technologies.

News source: GPT-4o