Generative AI has already shown great potential in robotics, where applications include natural language interaction, robot learning, no-code programming, and even design. This week, Google’s DeepMind Robotics team is showcasing another potential sweet spot between the two: navigation.
In the paper “Mobility VLA: Multimodal Directed Navigation with Long Context VLM and Topological Graphs” The team demonstrated how they implemented Google Gemini 1.5 Pro to teach the robot to respond to commands and navigate around the office. Naturally, DeepMind used some of the Every Day Robots that have been in use at Google since it was founded. Project closure due to massive staff cuts last year.
In a series of videos accompanying the project, a DeepMind employee begins with a smart-assistant-esque “OK, robot,” and then commands the system to perform various tasks around the 9,000-square-foot office space.
In one example, a Google employee asks the robot to take him somewhere to draw a picture. “OK,” the robot, wearing a smart yellow bow tie, replies. “Hold on. I’m thinking about it in Gemini…” The robot then guides the human to a wall of whiteboards. In a second video, another person instructs the robot to follow the instructions on the whiteboard.
A simple map shows the robot how to get to the “blue area.” The robot again thinks for a moment, then takes the long way to what turns out to be the robot’s testing area. “I followed the instructions on the whiteboard very well,” the robot announces with a level of confidence most humans can only dream of.
Prior to these videos, the robot was familiarized with the space using what the team calls “Multimodal Instructed Navigation with Demonstration Tours (MINT).” In practice, this means having the robot walk around the office while being pointed to different landmarks by voice. Then, the team used hierarchical visual-verbal behaviors (VLA) to “[e] “Environmental understanding and common sense reasoning” These processes combined enable a robot to respond to written and drawn commands and gestures.
Google says the robots had a success rate of around 90% over more than 50 interactions with employees.