Return to site

Several new AI capabilities showcased at Google I/O 2024

May 14, 2024

Get ready for a future where your glasses whisper directions in your ear and anyone can become a video director with just their voice.

These are just a taste of the many things that were teased at Google I/O '24, which centered around the company's next generation of AI-powered products. After yesterday's GPT-4o launch by OpenAI which captured the tech sector's attention and imagination, this was Google's chance to steal back some of the limelight.

Google CEO Sundar Pichai kicked off the event by highlighting the company's decade-long commitment to AI research and development. It was a subtle reminder that, amdist this AI gold rush where OpenAI and other players are innovating rapidly, we shouldn't count Google out. This company still has a lot to offer.

The Power of Gemini: Understanding the World Through Multiple Lenses

Imagine a world where AI understands not just your words, but also your code, photos, and even videos as input. This is the vision behind Gemini, a multimodal foundation model that can process information from a variety of formats.

Coupled with these multimodal features are incredibly long context lengths: currently 1 million tokens, with 2 million tokens becoming available soon. This long context capability allows Gemini to understand complex conversations and provide answers that tie to content that may extend over e.g. a hundred pages. And perhaps even more exciting, Gemini will also be able to generate higher quality images (Imagen 3), work alongside musicians (Music AI Sandbox), and also produce video content based on user text prompts. (See Google's Project Veo.)

Search is Still Evolving:

Gemini is working on a new feature in search called AI Overviews. This allows you to ask complex, multi-step questions, even using photos and video to initiate your search. Imagine asking "What record player should I buy?" while showing a picture of your existing audio setup! "AI Overviews" will analyze the web and provide insightful recommendations based on your specific needs.

Google Photos is also getting an upgrade with Ask Photos. This lets you search your photo library using natural language. Need to find a picture of your car's license plate? No problem! Ask Photos can infer which photos show your car and pull up the relevant image. Google's demo showed how this feature will not just find individual photos, but also be able to compile a series of your photos on the fly to answer a complex question.

And for those who rely on Google Workspace to get things done, Gemini is bringing new levels of efficiency. You'll be able to e.g. summarize a week's worth of emails in seconds, or have AI automatically analyze attachments and draft a reply. It will even be able to take multiple steps on your behalf, given a specific objective.

A Glimpse into the Future of AI:

Google's Project Astra, unveiled at I/O, hints at the future supported by AI assistants. They demonstrated an intelligent system that can reason, plan, and even understand your surroundings.

Did we just get our first view of a new Google Glasses?

These are some of the highlights from the 2024 Google I/O developer conference. Stay tuned for further details on these innovations and how they'll shape the future of tech.