Google's Ironwood TPU for AI Inference Debuts at Cloud Next 2025
  • 386 views
  • 2 min read

At Google Cloud Next 2025, Google unveiled Ironwood, its seventh-generation Tensor Processing Unit (TPU), marking a significant leap forward in AI inference technology. This new TPU is Google's most powerful and scalable custom AI accelerator to date, designed explicitly for inference workloads, signifying a strategic shift towards supporting the growing demands of generative AI and "thinking models."

For over a decade, TPUs have been the backbone of Google's AI infrastructure, powering both training and serving workloads. Ironwood builds upon this legacy, offering enhanced capabilities and energy efficiency to handle the complexities of modern AI models. This new generation of TPU is purpose-built to power inferential AI models at scale, enabling faster and more efficient processing of AI tasks.

Ironwood is designed to support the next phase of generative AI, addressing the substantial computational and communication demands of these advanced models. It achieves this by scaling up to 9,216 liquid-cooled chips interconnected through a high-bandwidth Inter-Chip Interconnect (ICI) network, consuming nearly 10 MW of power. This massive scale allows Ironwood to deliver 42.5 exaFLOPS of compute, surpassing the capabilities of the world's largest supercomputers. The architecture is a key component of Google Cloud's AI Hypercomputer, which optimizes both hardware and software to tackle demanding AI workloads.

One of the key features of Ironwood is its support for "thinking models," including Large Language Models (LLMs) and Mixture of Experts (MoEs). These models require massive parallel processing and efficient memory access, and Ironwood is designed to minimize data movement and latency during complex tensor manipulations. Ironwood also features enhanced SparseCore, increased High-Bandwidth Memory (HBM) capacity and bandwidth, and improved ICI networking.

Developers can leverage Google's Pathways software stack to harness the combined computing power of thousands of Ironwood TPUs. Pathways, developed by Google DeepMind, is a distributed runtime environment that enables dynamic scaling of inference workloads. It includes features like disaggregated serving, which allows independent scaling of the prefill and decode stages of inference, resulting in ultra-low latency and high throughput. Google Kubernetes Engine (GKE) also provides new inference capabilities, including GenAI-aware scaling and load balancing, which can reduce serving costs, decrease tail latency, and increase throughput.

The introduction of Ironwood also marks a broader shift in AI infrastructure development. It supports the transition from AI models that provide real-time information for human interpretation to AI systems that proactively generate insights and interpret data. This shift, dubbed the "age of inference," enables AI agents to retrieve and generate data collaboratively, delivering actionable insights and answers.

Compared to the previous generation, Trillium, Ironwood offers five times more peak compute capacity and six times the high-bandwidth memory capacity, while also being twice as power-efficient. This increase in performance and efficiency allows Google Cloud customers to tackle demanding AI workloads with greater speed and lower energy consumption.

With Ironwood, Google Cloud aims to provide its customers with a comprehensive AI-optimized platform that offers leading price, performance, and precision. This platform includes advanced infrastructure, world-class models, and a robust developer platform in Vertex AI, providing a comprehensive suite of tools for building multi-agent systems.


Rohan Sharma is a seasoned tech news writer with a knack for identifying and analyzing emerging technologies. He possesses a unique ability to distill complex technical information into concise and engaging narratives, making him a highly sought-after contributor in the tech journalism landscape.

Latest Post


Sony has recently increased the price of its PlayStation 5 console in several key markets, citing a "challenging economic environment" as the primary driver. This decision, which impacts regions including Europe, the UK, Australia, and New Zealand, r...
  • 466 views
  • 3 min

Intel Corporation has announced a definitive agreement to sell a 51% stake in its Altera business to Silver Lake, a global technology investment firm, for $8. 75 billion. This move aims to establish Altera as an operationally independent entity and th...
  • 442 views
  • 2 min

Meta is set to recommence training its artificial intelligence (AI) models using public data from adult users across its platforms in the European Union. This decision comes after a pause of nearly a year, prompted by data protection concerns raised ...
  • 498 views
  • 2 min

Nvidia is embarking on a significant shift in its manufacturing strategy, bringing the production of its advanced AI chips and supercomputers to the United States for the first time. This move marks a major milestone for the company and a potential t...
  • 161 views
  • 2 min

  • 174 views
  • 3 min

About   •   Terms   •   Privacy
© 2025 techscoop360.com