A Tale of Two Tensor Processors
Google has officially unveiled its eighth generation of Tensor Processing Units, but this time it is not a single chip. Instead, the company has split its custom AI hardware into two distinct variants. The TPU 8t is built exclusively for the massive computational demands of training frontier models. The TPU 8i is designed specifically for the inference phase, where models generate responses and power autonomous agents.
This bifurcation reflects Google’s belief that the industry is entering the “agentic era.” The company argues that running specialized agents requires a fundamentally different hardware architecture than training a monolithic model. By separating the workloads, Google aims to maximize efficiency across the entire AI lifecycle, allowing cloud customers to pay only for the capability they actually need.
Performance and Efficiency at Scale
For training, the TPU 8t pods pack 9,600 chips with two petabytes of shared high bandwidth memory, delivering 121 FP4 EFlops per pod. Google claims a linear scaling capability that can link up to one million chips in a single logical cluster, a feat designed to shrink training timelines from months to weeks. The company also reports a “goodpute” rate of 97 percent, meaning the chips spend almost all their time on meaningful computation rather than waiting on memory or faults.
On the inference side, the TPU 8i triples on chip SRAM to 384 MB to keep larger key value caches local, which speeds up models with long context windows. These chips run in pods of 1,152 units and are the first Google accelerators to rely solely on the custom Axion ARM CPU host, with one CPU per two TPUs. Google claims this full stack ARM approach delivers twice the performance per watt compared to the previous Ironwood generation.
Infrastructure Designed for a Cloud Native Future
Beyond the chips themselves, Google has co designed its data centers around the new TPUs. Integrating networking directly onto the compute die and optimizing pod layouts has reportedly increased compute per unit of electricity by six times. To handle the intense heat, the company deployed fourth generation liquid cooling with actively controlled valves that adjust water flow based on real time workload demands.
Both the TPU 8t and TPU 8i support standard developer frameworks such as JAX, MaxText, PyTorch, SGLang, and vLLM. This ensures that third party developers can immediately leverage the new hardware without rearchitecting their pipelines. While Nvidia briefly saw a 1.5 percent stock dip on the news, Google’s announcement signals its commitment to building an end to end AI infrastructure that challenges the dominant GPU paradigm.
Source: Arstechnica
