Google DeepMind has achieved a significant leap in robotics by developing a new AI model, Gemini Robotics On-Device, that operates directly on robotic devices, eliminating the need for constant internet connectivity. This breakthrough promises to revolutionize how robots function in various real-world scenarios, especially where low latency and network independence are crucial.
Gemini Robotics On-Device is an optimized version of the Gemini Robotics VLA (vision language action) model, initially launched in March. It brings Gemini 2.0's multimodal reasoning and real-world understanding into the physical world. The on-device model is engineered to run locally, ensuring robust performance in environments with limited or no network connectivity, providing low-latency inference vital for time-sensitive applications. This localized operation addresses the limitations of traditional cloud-based robotic systems, which can suffer from latency and reliability issues in unstable network conditions.
This new AI model is designed for general-purpose dexterity and rapid task adaptation. It serves as a foundational robotics model, especially for bi-arm robots, and is engineered to require minimal computational resources. The model demonstrates strong visual, semantic, and behavioral generalization across various testing scenarios. It enables robots to follow natural language instructions and complete highly dexterous tasks, such as unzipping bags or folding clothes.
One of the key advantages of Gemini Robotics On-Device is its adaptability. DeepMind claims that the model can quickly adjust to new tasks with as few as 50 to 100 demonstrations. This capability allows the robot to generalize its foundational knowledge to new tasks effectively. The model has been successfully adapted to various robotic platforms, including the bi-arm Franka FR3 robot and the Apollo humanoid robot by Apptronik, even though it was initially trained on ALOHA robots. This demonstrates its versatility across different hardware configurations. On the FR3 robot, the AI model followed general-purpose instructions, handled previously unseen objects and scenes, and executed industrial belt-assembly tasks requiring precision and dexterity.
Safety is a top priority for DeepMind. Gemini Robotics On-Device includes comprehensive safety measures, developed in collaboration with experts and policymakers, to minimize potential risks. The model is part of a larger safety-first initiative overseen by DeepMind's Responsible Development & Innovation (ReDI) team and the Responsibility & Safety Council. These groups ensure every stage, from instruction processing to physical action, undergoes thorough testing to prevent unsafe behaviors. Safety benchmarks and "red-teaming" are recommended before deploying the model in real-world use. The company wires semantic filters to the live API stack to watch for awkward or unsafe instructions, and a low-level safety controller cross-checks torque limits, collision cones, and velocity caps.
To facilitate developer evaluation of Gemini Robotics On-Device, DeepMind is releasing a Gemini Robotics SDK. This SDK enables testing within the MuJoCo physics simulator and allows for quick adaptation to new domains. Developers can access the SDK by enrolling in the trusted tester program. This allows builders to train and test the model in both simulation and on real robots, making it easier to test safely before real-world deployment.
The Gemini Robotics On-Device represents a significant step forward in the development of AI systems designed to control real-world robots. Its ability to operate independently of a network, adapt to new tasks quickly, and generalize across different robotic embodiments makes it a valuable tool for various applications, from industrial automation to home service robots. As DeepMind continues to refine and expand the capabilities of Gemini Robotics, the future of robotics looks increasingly intelligent and adaptable.