Earlier this month Google hosted Google I/O 2022, its annual developer conference, where it announces major software, hardware, and platform updates across its entire product portfolio. At the opening keynote on Day One, held in Mountain View California (and live streamed globally), Sundar Pichai, Google’s CEO, took to the stage to announce a series of innovations across its Pixel phones, buds, and tablets. It even introduced AR glasses that will be able to perform real-time translation (although there were no details on if/when this would be available to consumers). 

Whilst these consumer tech innovations are all exciting, Google’s latest AI announcements are arguably the most transformative. Google’s overarching mission is “to organise the world’s information and make it universally accessible and useful.”; its recent announcements will help it better understand, organise, and present information, and will allow it to integrate increasingly sophisticated capabilities across its entire product portfolio, from Search and YouTube to Workspace and Maps. 


Over the past few years, Google has introduced a number of different features to allow people to search more seamlessly and intuitively. In 2017 it introduced Google Lens (an image recognition technology) that allows users to search for what they see. Later, in 2021, Google introduced Multitask Unified Model (MUM), an AI model that combines both Natural Language Understanding (NLU) and computer vision to understand information from different formats like pictures and webpages, simultaneously.  

This was showcased at this years’ I/O when Google introduced Multisearch. Multisearch allows  users to combine both visual and traditional text-based search to ask questions about things that you see (think finding similar looking buildings to a photo you took, but asking Google to only return results for buildings built in Paris before 2005). Google states that this is just the beginning of Multisearch and it expects to rollout updates later this year that allow you to combine visual and query-based search with geolocation to return only locally relevant results. Think scanning a photo of a shirt you like and asking “where I can buy this near me?”.


Last year Google launched auto-generated chapters, which combine some of the latest advancements in NLU, computer vision, and multimodal deep learning from DeepMind to simultaneously analyse text, audio, and visual information. This incredibly sophisticated algorithm allows Google to accurately generate chapters for online videos automatically, allowing you to more easily jump to the part of the video you are most interested in. Google aims to increase the number of auto-chaptered videos on YouTube from 8 million to over 80 million videos over the next 12 months, as it aims to make information (increasingly communicated via online video) easier to navigate and search.  

Google is also using speech recognition algorithms to automatically generate subtitles and video transcriptions on YouTube. It is also taking this one step further and combining its speech recognition algorithms with its state-of-the-art machine translation models to auto-translate and transcribe videos in YouTube. It is starting with 16 languages but aims to roll this feature out across many more languages in the coming months and years. Eventually it hopes that viewers will be able to watch videos in any language and have them auto-translated and subtitled in real-time. 


Just as Google is supercharging Search and YouTube with advanced AI, it is also bringing new AI-powered features to its suite of Workspace products. Again, building on its latest research in NLU, it is releasing automated summarisation which (as the name suggests) is able to parse, understand, and summarise documents automatically. That’s not all, Google plans to rollout Automated Summarisation to more Workspace products in the coming months, and is working to integrate auto-transcription and auto-summarisation to Google Meet. These advanced features will allow you to easily catchup on documents, emails, chats, and eventually meetings without having to read or watch everything that you missed. 

Google Maps and Google Earth 

Google Maps and Google Earth help us make sense of the physical and geospatial information that is all around us, whether that is helping us get from A to B or helping climate researchers monitor and analyse changes in our environment using satellite imagery. 

Many of us use Google Maps regularly and take for granted that these search and navigation experiences may not be as rich and immersive everywhere else in the world. Google has already mapped an impressive 1.6 billion buildings and 60 million kilometres of road to date, but has previously struggled to map remote and rural areas. Advances in computer vision are now allowing it to detect buildings at scale from satellite imagery, allowing it to provide a more detailed picture of even remote and rural areas. Using these techniques, it has already increased the number of buildings on Google Maps in Africa by 5x since July 2020 (now 300 million), and has doubled the number of mapped buildings in India and Indonesia this year.  

Furthermore, using the latest advancements in AI/ML and 3D mapping technologies such as LiDAR, Google is fusing billions of aerial and street level images to build immersive, high-fidelity representations of places around the world. This so-called immersive view is coming to Google Maps and will allow users to explore the world like never before.  

The AI revolution 

Andrew Ng once said “just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI will transform in the next several years”. This transformation is already underway. AI/ML already underpins many of the tools and technologies that we interact with on a daily basis, from recommendation engines in Netflix to Siri and Google Assistant. As Google and other technology companies continue to push the boundaries of AI research, digital products and services will become increasingly sophisticated, and will deliver more seamless, intuitive, and personalised experiences across both our personal and professional lives.  

In the next few weeks, we’ll publish a follow-up blog looking at how advancements in NLU and computer vision look set to transform media and content generation. In the meantime, do get in touch if you’d like to discuss.