Introducing Gemini 1.5
This is an exciting AI time. New advances in AI could help billions of people in the future. Since launching Gemini 1.0, Google has tested, improved, and added features.
Gemini 1.5: Google’s next-generation model
Performance is greatly improved in Gemini 1.5. Their approach has changed, incorporating research and engineering innovations into nearly every aspect of foundation model development and infrastructure. A new Mixture-of-Experts (MoE) architecture makes Gemini 1.5 train and serve more efficiently.
First up is Gemini 1.5 Pro, their early testing model. It’s a mid-size multimodal model optimized for scaling across many tasks and performs similarly to Gemini 1.0 Ultra, Google’s largest model. A breakthrough experimental feature in long-context understanding is also introduced.
The standard Gemini 1.5 Pro context window holds 128,000 tokens. AI Studio and Vertex AI are now offering private previews to a select group of developers and enterprise customers with a context window of up to 1 million tokens.
Optimizations to reduce latency, computational requirements, and user experience are underway as they roll out the full 1 million token context window. Google is excited for people to try this breakthrough capability;
More advances in our next-generation models will allow people, developers, and enterprises to create, discover, and build with AI.
A highly efficient architecture, Gemini 1.5 is based on Google’s leading Transformer and MoE research. Instead of one large neural network, MoE models have smaller “expert” neural networks.
MoE models learn to activate only relevant expert pathways in their neural network based on input. This specialization greatly improves model efficiency. Google pioneered deep learning MoE with Sparsely-Gated MoE, GShard-Transformer, Switch-Transformer, M4, and more.
Google’s latest model architecture innovations help Gemini 1.5 learn complex tasks faster, maintain quality, and train and serve more efficiently. These efficiencies are helping their teams iterate, train, and deliver more advanced Gemini faster than ever, and Google is optimizing.
More context, better help
Tokens form an AI model’s “context window” for processing information. Tokens can be words, images, videos, audio, or code. A larger context window allows a model to process more information in a prompt, improving its consistency, relevance, and usefulness.
Google’s machine learning innovations have increased 1.5 Pro’s context window capacity beyond Gemini 1.0’s 32,000 tokens. They can now produce 1 million tokens.
This means Gemini 1.5 Pro can process 1 hour of video, 11 hours of audio, 30,000 lines of code, or 700,000 words in one go. Up to 10 million tokens were tested in their research.
Complex reasoning about massive data
Gemini 1.5 Pro efficiently analyzes, classifies, and summarizes large amounts of content per prompt. It can reason about conversations, events, and details in the 402-page Apollo 11 transcripts.
Gemini 1.5 Pro can comprehend, reason, and identify intriguing details in Apollo 11’s 402-page transcripts.
Better cross-modal reasoning
1.5 Pro performs advanced understanding and reasoning for video and other modalities. When given a 44-minute silent Buster Keaton film, the model can accurately analyze plot points and events and even reason about small details that could be missed.
Gemini 1.5 Pro can identify a 44-minute silent Buster Keaton film scene from a simple line drawing of a real-life object.
Relevant problem-solving with longer code
1.5 Pro solves more relevant problems across longer code blocks. It can reason across examples, suggest helpful modifications, and explain how different parts of the code work when given a prompt with more than 100,000 lines of code.
Gemini 1.5 Pro can solve, modify, and explain 100,000 lines of code.
Improved performance
1.5 Pro beats 1.0 Pro on 87% of Google’s large language model benchmarks in text, code, image, audio, and video evaluations. It performs similarly to 1.0 Ultra on the same benchmarks.
Gemini 1.5 Pro performs well as its context window grows. 1.5 Pro found a small piece of text containing a fact or statement 99% of the time in Needle In A Haystack (NIAH) evaluations of 1 million data tokens.
Gemini 1.5 Pro also has impressive “in-context learning” skills, learning a new skill from a long prompt without fine-tuning. The Machine Translation from One Book (MTOB) benchmark measures how well the model learns from new information. Given a Kalamang grammar manual, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang similarly to a human.
Since 1.5 Pro’s long context window is unique among large-scale models, Google is constantly creating new evaluations and benchmarks to test its novel capabilities.
Thorough ethics and safety testing
Their models undergo rigorous ethics and safety tests following Google’s AI Principles and safety policies. To improve their AI systems, they apply these research findings to governance, model development, and evaluation.
Google teams have refined 1.0 Ultra since December to make it safer for a wider release. They also conducted novel safety risk research and developed red-teaming methods to test for various harms.
Google has prepared 1.5 Pro for responsible deployment by conducting extensive content safety and representational harm testing, as they did for Gemini 1.0. They will continue this testing. Beyond this, they are creating tests for 1.5 Pro’s novel long-context capabilities.
Build and test Gemini models
Google aims to responsibly bring new Gemini models to billions of people, developers, and businesses worldwide.
Developers and enterprise customers can preview 1.5 Pro in AI Studio and Vertex AI starting today.
When the model is ready, they will release 1.5 Pro with a 128,000 token context window. Google will soon introduce pricing tiers starting at 128,000 context windows and scaling up to 1 million tokens as they improve the model.
Early testers can try the 1 million token context window for free, but expect longer latency. Significant speed improvements are coming.
Developers can sign up in AI Studio to test 1.5 Pro, while enterprise customers can contact their Vertex AI account team.
FAQ
Gemini 1.5 Release Date
In reality, Google released the Gemini 1.5 Pro first, then the regular Gemini 1.5 model.
This is an explanation:
In February 2024, Google reveals and makes available Gemini 1.5 Pro in private preview to select developers and business clients. It boasts performance comparable to the largest model at the time, Gemini 1.0 Ultra, and comes with a context window that holds 128,000 tokens.
Google intends to release the standard Gemini 1.5 model for wider use later in 2024 (date to be announced). Its capabilities and the date of release are still unknown.
0 Comments