Google’s new Gemini AI beats GPT-4 in 30 of 32 tests

But will the difference be enough to matter in real life?

Tech giant Google has finally unveiled its much-hyped Gemini AI, a series of generative AI models it claims are its “largest and most capable” to date. 

“This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company,” said Google CEO Sundar Pichai. 

Multimodal AI: Generative AIs are algorithms trained to create original content in response to user prompts. OpenAI’s first iteration of ChatGPT, for example, can understand and produce human-like text, while its DALL-E 2 system can generate images based on text prompts. 

While those systems understand and generate just one type of content, a multimodal generative AI can work with several — in September, OpenAI announced a multimodal version of ChatGPT that could understand image, voice, and text inputs.

“Its capabilities are state-of-the-art in nearly every domain.”

Demis Hassabis

The Gemini era: According to Google, multimodal AIs are traditionally created by combining separate, specialized models into one program, but it took a different approach with its Gemini AI, training it to be multimodal from the start.

“This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models — and its capabilities are state-of-the-art in nearly every domain,” wrote Demis Hassabis, CEO and cofounder of Google DeepMind.

In addition to being highly capable, Google says the Gemini AI is also its “most flexible” model. This has allowed the company to create three different sizes of the AI: Ultra, Nano, and Pro. 

  • Gemini Ultra is the most powerful model, designed for complex tasks. According to Google, it’s the first generative AI model to outperform human experts on the MMLU, a benchmark assessing knowledge across 57 subjects. Google is currently soliciting feedback on Ultra from select users, but expects to make it widely available in 2024.
  • Gemini Nano is the least capable model, but it’s small and efficient enough to run locally on smartphones. Google has already made it available on its Pixel 8 Pro — owners of that smartphone can use the AI to summarize audio recordings or generate responses to WhatsApp messages.
  • Gemini Pro, meanwhile, falls between Nano and Ultra in terms of capabilities and size. Google has integrated an English-language version of that model into its ChatGPT-like Bard, which will reportedly get an Ultra upgrade in 2024.

The big picture: Like the rest of the tech industry, Google has been racing to catch up with OpenAI in the generative AI space ever since the release of ChatGPT in 2022, and it’s been hyping the Gemini AI for months as the tech that will put it ahead. 

While Gemini did outperform OpenAI’s GPT-4 on 30 of 32 benchmarks tested (including the MMLU), the difference was often just a percentage point or two — meaning Google may be ahead, but only by a little and only compared to an AI model that’s been out for 9 months already.

“It’s clear that Gemini is a very sophisticated AI system … [but] it’s not obvious to me that Gemini is actually substantially more capable than GPT-4,” Melanie Mitchell, an AI researcher at the Santa Fe Institute in New Mexico, told MIT Technology Review.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Related
New AI detection tool measures how “surprising” word choices are
A new AI detection tool that measure how “surprising” text is reportedly delivers far fewer false positives than existing options.
DeepMind’s AI could accelerate drug discovery
A new study suggests that AlphaFold, DeepMind’s AI tool for predicting protein structures, could be useful for drug discovery after all.
10 must-see technologies from CES 2024
From super-hyped AI assistants to apps that translate babies’ cries, CES 2024 has given us a glimpse at the tech of tomorrow, today.
Data poisoning: how artists are sabotaging AI to take revenge on image generators
Artists unhappy with their work being used by generative AI have are using “data poisoning” to mess with the algorithm.
Microsoft launches Copilot Pro for “power users”
Microsoft has launched Copilot Pro, a premium subscription service that makes its AI companion accessible to more people in more contexts.
Up Next
A black and white photo of the advice columnist known as 'Dear Abby' with generative text collage elements.
Subscribe to Freethink for more great stories