Google's Ironwood TPU for AI Inference Debuts at Cloud Next 2025

Apr 09, 2025
419 views
2 min read

At Google Cloud Next 2025, Google unveiled Ironwood, its seventh-generation Tensor Processing Unit (TPU), marking a significant leap forward in AI inference technology. This new TPU is Google's most powerful and scalable custom AI accelerator to date, designed explicitly for inference workloads, signifying a strategic shift towards supporting the growing demands of generative AI and "thinking models."

For over a decade, TPUs have been the backbone of Google's AI infrastructure, powering both training and serving workloads. Ironwood builds upon this legacy, offering enhanced capabilities and energy efficiency to handle the complexities of modern AI models. This new generation of TPU is purpose-built to power inferential AI models at scale, enabling faster and more efficient processing of AI tasks.

Ironwood is designed to support the next phase of generative AI, addressing the substantial computational and communication demands of these advanced models. It achieves this by scaling up to 9,216 liquid-cooled chips interconnected through a high-bandwidth Inter-Chip Interconnect (ICI) network, consuming nearly 10 MW of power. This massive scale allows Ironwood to deliver 42.5 exaFLOPS of compute, surpassing the capabilities of the world's largest supercomputers. The architecture is a key component of Google Cloud's AI Hypercomputer, which optimizes both hardware and software to tackle demanding AI workloads.

One of the key features of Ironwood is its support for "thinking models," including Large Language Models (LLMs) and Mixture of Experts (MoEs). These models require massive parallel processing and efficient memory access, and Ironwood is designed to minimize data movement and latency during complex tensor manipulations. Ironwood also features enhanced SparseCore, increased High-Bandwidth Memory (HBM) capacity and bandwidth, and improved ICI networking.

Developers can leverage Google's Pathways software stack to harness the combined computing power of thousands of Ironwood TPUs. Pathways, developed by Google DeepMind, is a distributed runtime environment that enables dynamic scaling of inference workloads. It includes features like disaggregated serving, which allows independent scaling of the prefill and decode stages of inference, resulting in ultra-low latency and high throughput. Google Kubernetes Engine (GKE) also provides new inference capabilities, including GenAI-aware scaling and load balancing, which can reduce serving costs, decrease tail latency, and increase throughput.

The introduction of Ironwood also marks a broader shift in AI infrastructure development. It supports the transition from AI models that provide real-time information for human interpretation to AI systems that proactively generate insights and interpret data. This shift, dubbed the "age of inference," enables AI agents to retrieve and generate data collaboratively, delivering actionable insights and answers.

Compared to the previous generation, Trillium, Ironwood offers five times more peak compute capacity and six times the high-bandwidth memory capacity, while also being twice as power-efficient. This increase in performance and efficiency allows Google Cloud customers to tackle demanding AI workloads with greater speed and lower energy consumption.

With Ironwood, Google Cloud aims to provide its customers with a comprehensive AI-optimized platform that offers leading price, performance, and precision. This platform includes advanced infrastructure, world-class models, and a robust developer platform in Vertex AI, providing a comprehensive suite of tools for building multi-agent systems.

Post

Writer - Rohan Sharma

Rohan Sharma is a seasoned tech news writer with a keen knack for identifying and analyzing emerging technologies. He's highly sought-after in tech journalism due to his unique ability to distill complex technical information into concise and engaging narratives. Rohan consistently makes intricate topics accessible, providing readers with clear, insightful perspectives on the cutting edge of innovation.

Latest Post

Infosys executive: Poly-AI adoption can yield substantial workforce efficiencies, potentially saving up to 35% on manpower.

Infosys is strategically leveraging its "poly-AI" or hybrid AI architecture to deliver significant manpower savings, potentially up to 35%, for its clients across various industries. This approach involves seamlessly integrating various AI solutions,...

Aug 17, 2025
426 views
3 min

ETtech Funding Surge: Indian Startups Secure $338 Million, Witnessing a Significant 65% Year-Over-Year Growth.

Indian startups have displayed significant growth in funding, securing $338 million, marking a substantial 65% year-over-year increase. This surge reflects renewed investor confidence in the Indian startup ecosystem and its potential for sustainable...

Aug 17, 2025
225 views
3 min

Cohere Reaches $6.8 Billion Valuation, Secures New Funding, and Strengthens Leadership with Key Executive Appointments

Cohere, a Canadian AI start-up, has reached a valuation of $6. 8 billion after securing $500 million in a recent funding round. This investment will help Cohere accelerate its agentic AI offerings. The funding round was led by Radical Ventures and In...

Aug 17, 2025
320 views
2 min

IIT Hyderabad develops driverless vehicle tech; Scaling up testing for autonomous navigation systems is in progress.

The Indian Institute of Technology Hyderabad (IIT-H) has made significant strides in autonomous vehicle technology, developing a driverless vehicle system through its Technology Innovation Hub on Autonomous Navigation (TiHAN). This initiative marks ...