Google's Ironwood TPU for AI Inference Debuts at Cloud Next 2025
  • 443 views
  • 2 min read

At Google Cloud Next 2025, Google unveiled Ironwood, its seventh-generation Tensor Processing Unit (TPU), marking a significant leap forward in AI inference technology. This new TPU is Google's most powerful and scalable custom AI accelerator to date, designed explicitly for inference workloads, signifying a strategic shift towards supporting the growing demands of generative AI and "thinking models."

For over a decade, TPUs have been the backbone of Google's AI infrastructure, powering both training and serving workloads. Ironwood builds upon this legacy, offering enhanced capabilities and energy efficiency to handle the complexities of modern AI models. This new generation of TPU is purpose-built to power inferential AI models at scale, enabling faster and more efficient processing of AI tasks.

Ironwood is designed to support the next phase of generative AI, addressing the substantial computational and communication demands of these advanced models. It achieves this by scaling up to 9,216 liquid-cooled chips interconnected through a high-bandwidth Inter-Chip Interconnect (ICI) network, consuming nearly 10 MW of power. This massive scale allows Ironwood to deliver 42.5 exaFLOPS of compute, surpassing the capabilities of the world's largest supercomputers. The architecture is a key component of Google Cloud's AI Hypercomputer, which optimizes both hardware and software to tackle demanding AI workloads.

One of the key features of Ironwood is its support for "thinking models," including Large Language Models (LLMs) and Mixture of Experts (MoEs). These models require massive parallel processing and efficient memory access, and Ironwood is designed to minimize data movement and latency during complex tensor manipulations. Ironwood also features enhanced SparseCore, increased High-Bandwidth Memory (HBM) capacity and bandwidth, and improved ICI networking.

Developers can leverage Google's Pathways software stack to harness the combined computing power of thousands of Ironwood TPUs. Pathways, developed by Google DeepMind, is a distributed runtime environment that enables dynamic scaling of inference workloads. It includes features like disaggregated serving, which allows independent scaling of the prefill and decode stages of inference, resulting in ultra-low latency and high throughput. Google Kubernetes Engine (GKE) also provides new inference capabilities, including GenAI-aware scaling and load balancing, which can reduce serving costs, decrease tail latency, and increase throughput.

The introduction of Ironwood also marks a broader shift in AI infrastructure development. It supports the transition from AI models that provide real-time information for human interpretation to AI systems that proactively generate insights and interpret data. This shift, dubbed the "age of inference," enables AI agents to retrieve and generate data collaboratively, delivering actionable insights and answers.

Compared to the previous generation, Trillium, Ironwood offers five times more peak compute capacity and six times the high-bandwidth memory capacity, while also being twice as power-efficient. This increase in performance and efficiency allows Google Cloud customers to tackle demanding AI workloads with greater speed and lower energy consumption.

With Ironwood, Google Cloud aims to provide its customers with a comprehensive AI-optimized platform that offers leading price, performance, and precision. This platform includes advanced infrastructure, world-class models, and a robust developer platform in Vertex AI, providing a comprehensive suite of tools for building multi-agent systems.


Written By
Rohan Sharma is a seasoned tech news writer with a keen knack for identifying and analyzing emerging technologies. He's highly sought-after in tech journalism due to his unique ability to distill complex technical information into concise and engaging narratives. Rohan consistently makes intricate topics accessible, providing readers with clear, insightful perspectives on the cutting edge of innovation.
Advertisement

Latest Post


Electronic Arts (EA), the video game giant behind franchises like "Madden NFL," "Battlefield," and "The Sims," is set to be acquired in a landmark $55 billion deal. This acquisition, orchestrated by a consortium including private equity firm Silver L...
  • 517 views
  • 3 min

ChatGPT is expanding its capabilities in the e-commerce sector through new integrations with Etsy and Shopify, enabling users in the United States to make direct purchases within the chat interface. This new "Instant Checkout" feature is available to...
  • 276 views
  • 2 min

The unveiling of Tilly Norwood, an AI-generated actor, has ignited a fierce debate in Hollywood, sparking anger and raising fundamental questions about the future of the acting profession. Created by Dutch producer and comedian Eline Van der Velden a...
  • 280 views
  • 2 min

Meta Platforms is preparing to launch ad-free subscription options for Facebook and Instagram users in the United Kingdom in the coming weeks. This move will provide users with a choice: either pay a monthly fee to use the platforms without advertise...
  • 369 views
  • 2 min

Advertisement
About   •   Terms   •   Privacy
© 2025 TechScoop360