Google Splits Its Next Gen TPU: One Chip for Training, One for Agents

Last updated: May 5, 2026 4:16 pm

AIWadmin

ByAIWadmin

Global AI news & information.

Follow:

A Tale of Two Tensor Processors

Google has officially unveiled its eighth generation of Tensor Processing Units, but this time it is not a single chip. Instead, the company has split its custom AI hardware into two distinct variants. The TPU 8t is built exclusively for the massive computational demands of training frontier models. The TPU 8i is designed specifically for the inference phase, where models generate responses and power autonomous agents.

Contents

A Tale of Two Tensor Processors Performance and Efficiency at Scale Infrastructure Designed for a Cloud Native Future

This bifurcation reflects Google’s belief that the industry is entering the “agentic era.” The company argues that running specialized agents requires a fundamentally different hardware architecture than training a monolithic model. By separating the workloads, Google aims to maximize efficiency across the entire AI lifecycle, allowing cloud customers to pay only for the capability they actually need.

Performance and Efficiency at Scale

For training, the TPU 8t pods pack 9,600 chips with two petabytes of shared high bandwidth memory, delivering 121 FP4 EFlops per pod. Google claims a linear scaling capability that can link up to one million chips in a single logical cluster, a feat designed to shrink training timelines from months to weeks. The company also reports a “goodpute” rate of 97 percent, meaning the chips spend almost all their time on meaningful computation rather than waiting on memory or faults.

On the inference side, the TPU 8i triples on chip SRAM to 384 MB to keep larger key value caches local, which speeds up models with long context windows. These chips run in pods of 1,152 units and are the first Google accelerators to rely solely on the custom Axion ARM CPU host, with one CPU per two TPUs. Google claims this full stack ARM approach delivers twice the performance per watt compared to the previous Ironwood generation.

Infrastructure Designed for a Cloud Native Future

Beyond the chips themselves, Google has co designed its data centers around the new TPUs. Integrating networking directly onto the compute die and optimizing pod layouts has reportedly increased compute per unit of electricity by six times. To handle the intense heat, the company deployed fourth generation liquid cooling with actively controlled valves that adjust water flow based on real time workload demands.

Both the TPU 8t and TPU 8i support standard developer frameworks such as JAX, MaxText, PyTorch, SGLang, and vLLM. This ensures that third party developers can immediately leverage the new hardware without rearchitecting their pipelines. While Nvidia briefly saw a 1.5 percent stock dip on the news, Google’s announcement signals its commitment to building an end to end AI infrastructure that challenges the dominant GPU paradigm.

Source: Arstechnica

Beijing Blocks Meta’s Manus Grab: The ‘Singapore Wash’ Strategy Hits a Wall

Ouster’s color lidar sensor aims to kill the camera in robotics and self driving cars

Huang’s Cheerleading Act: Nvidia’s CEO Dismisses AI Job Fears as Sci-Fi Hype

OpenAI and Anthropic’s New Ventures Are a Hostile Takeover of Enterprise AI

The Xteink X3 Is Not a Salvation Device

Uber’s Dark Plan to Turn Every Driver Into an Unpaid Sensor for Its AV Empire

The Great Tech Bloodletting of 2025: 22,000+ Workers Sacrificed on the Altar of AI

OpenAI’s 2025 Reckoning: Code Red, Lawsuits, and the Race Against Rivals

OpenAI’s Secret War on Goblins: Inside the Bizarre Codex Prompt That Bans a Fantasy Species

Japan Airlines Ropes in Wobbly Humanoid Robots to Fill Airport Jobs. It’s Not Going Great Yet.

Google’s Privacy Maze: How Gemini Traps You and Your Data

Google’s $40 Billion Anthropic Bet Is Really a $40 Billion Self-Dealing Loop

GitHub Pulls the Plug on Copilot Subsidies, Billing by the Token Starting June 1

Europe Demands Google Unlock Android for Rival AI Assistants. Google Fights Back.

When AI Data Centers Become Battlefield Targets: The Gulf’s Cloud War Just Got Real

Beijing’s Veto of the Meta Manus Deal Exposes the Cracks in US China Tech Relations

Robots Are Your New Baggage Handlers at Haneda Airport. Yes, It Is That Awkward

Google’s Gemini trap: dark patterns designed to hoover your data

Google Splits Its Next Gen TPU: One Chip for Training, One for Agents

A Tale of Two Tensor Processors

Performance and Efficiency at Scale

Infrastructure Designed for a Cloud Native Future

Leave a Reply Cancel reply

Your Trusted Source for Accurate and Timely Updates!

Quick Links

About Us

A Tale of Two Tensor Processors

Performance and Efficiency at Scale

Infrastructure Designed for a Cloud Native Future

Leave a Reply Cancel reply

Your Trusted Source for Accurate and Timely Updates!

You Might Also Like

Quick Links

About Us