Google's New TPU v8 Chips Take Direct Aim at Nvidia's Inference Empire

Google Cloud Next opens Wednesday in Las Vegas, and Google is using the conference to do something it has rarely done in its decade of TPU development: explicitly split its custom chip roadmap into separate training and inference product lines, with different design partners for each, and stake its AI economics on beating Nvidia where Nvidia is most vulnerable, according to a Bloomberg feature published Monday via the Los Angeles Times.

The announcement lands at a moment when Nvidia's inference dominance is being questioned for the first time in years.

The Headline from Cloud Next

Google Chief Scientist Jeff Dean told Bloomberg that as AI demand grows, "it now becomes sensible to specialize chips more for training or more for inference workloads," per the LA Times. Amin Vahdat, who oversees Google's AI infrastructure and chip work, declined to confirm specific inference-chip plans but said more would likely be shared "in the relatively near future."

The TPU v8 family is expected to feature prominently at the Wednesday through Friday keynote program, based on media and supply-chain reporting summarized on X by semiconductor analyst Dan Nystedt. Two variants are in the roadmap: TPUv8t "Sunfish" for training, designed with Broadcom, and TPUv8i "Zebrafish" for inference, designed with MediaTek. Both target TSMC's 2-nanometer process node for late 2027 production.

The split matters. Until this generation, Google had resisted separate chip designs for training and inference. Partha Ranganathan, a vice president and engineering fellow at Google, told Bloomberg the company weighed the idea in its early days before ultimately deciding against it. "The battleground is shifting towards inference," Gartner analyst Chirag Dekate told Bloomberg, describing why Google is now changing course.

Sunfish and Zebrafish: Two Chips, One Strategy

The two-chip approach lets Google optimize each silicon path for the workload it actually serves. Training chips need peak floating-point performance and high-bandwidth memory to keep thousands of accelerators fed during weeks-long model training runs. Inference chips need low latency per query, high throughput at batch sizes that match real user traffic, and cost efficiency per token served.

MediaTek's involvement on Zebrafish is not new. The company already designed the I/O modules and peripheral components on Google's seventh-generation Ironwood TPU, where its designs run 20 to 30 percent cheaper than alternatives, according to The Next Web's detailed breakdown of Google's chip roadmap. Extending MediaTek's role to the full Zebrafish chip signals Google is willing to accept a cost-optimized variant that may trade peak performance for lower per-query economics.

The current generation already tells part of the story. Ironwood, which Google calls "the first Google TPU for the age of inference," is now generally available to Google Cloud customers. It delivers 10x the peak performance of the TPU v5p, offers 192 GB of HBM3E memory per chip with 7.2 TB per second of bandwidth, and scales to 9,216 liquid-cooled chips in a single superpod producing 42.5 FP8 exaflops, per The Next Web's reporting. Google plans to produce millions of Ironwood units this year.

The Four-Partner Supply Chain

What distinguishes this announcement from a typical chip refresh is the supply chain architecture behind it. Google is now building with four distinct partners, each handling different segments of the product line, as detailed in The Next Web's coverage.

Partner	Role	Product
Broadcom	High-performance training chips; long-term TPU agreement through 2031	TPU v8 "Sunfish" (training)
MediaTek	Cost-optimized inference variant	TPU v8 "Zebrafish" (inference)
Marvell	In talks for memory processing unit plus additional inference TPU	MPU + second inference TPU (unsigned)
Intel	Xeon CPUs and custom infrastructure processing units for data centers	Networking and general-purpose compute

TSMC fabricates every chip regardless of which partner designed it. The four-partner structure gives Google negotiating leverage — each vendor knows the others exist — and reduces the strategic risk of depending on any single supplier. Nvidia, by contrast, designs its own GPUs in-house and has minimal overlap with hyperscaler chip programs.

Nvidia is not sitting still. In March, it invested $2 billion in Marvell, per Yahoo Finance summarizing Bloomberg reporting, and launched NVLink Fusion to integrate custom chips from third parties with its own interconnect fabric. The move ensures Nvidia retains a position in racks where its GPUs are supplemented or replaced by ASICs from companies like Marvell.

Why Inference Is Where the Money Is Now

Training a frontier model is a singular, intensive event. Inference is continuous and scales with every user, every query, and every product that incorporates AI. Google serves billions of AI-augmented search queries, Gemini conversations, and Cloud AI API calls daily. At that scale, the cost per inference — not the cost of the initial training run — determines the economics of the entire AI business.

The custom ASIC market is growing faster than GPUs as a result. TrendForce projects custom chip sales will increase 45 percent in 2026, compared with 16 percent growth in GPU shipments, according to The Next Web. The market is expected to reach $118 billion by 2033. Google's total expected TPU shipments are projected at 4.3 million units in 2026, scaling to more than 35 million by 2028.

The external demand is real. Anthropic signed a deal in October for access to as many as 1 million TPUs, plus a separate agreement with Broadcom that will bring it roughly 3.5 gigawatts of next-generation TPU-based compute starting in 2027, per the LA Times. Meta signed a multi-billion-dollar TPU access deal and just received its first significant supply to test. Citadel Securities is scheduled to present at Cloud Next on how TPUs let it train models faster than its previous GPU setup. G42, the Abu Dhabi technology conglomerate, has held multiple discussions with Google about using TPUs as well.

What This Means for Nvidia

Nvidia CEO Jensen Huang has not conceded the inference market. In a recent podcast interview cited by Bloomberg via the LA Times, he stressed the advantages of Nvidia's chips, saying they can do "a whole bunch of applications" that "you can't do with TPUs." Last month, Nvidia began selling its own inference-focused chip based on technology from a reported $20 billion licensing deal with Groq.

The challenge Nvidia faces is not that any single Google chip will outperform its GPUs on every workload. It is that Google is building a system in which multiple custom chips, each optimized for a specific workload and cost point, collectively reduce the share of Google's AI compute that runs on Nvidia hardware. Google serves more inference volume than any other company in the world. Every percentage point of that volume that moves to TPUs is revenue that does not land at Nvidia.

Demis Hassabis, CEO of Google DeepMind, framed customer demand directly to Bloomberg: "A lot of people would like to run on both," he said, referring to TPUs and GPUs, adding that interest in TPUs is "particularly high from leading AI labs," per the LA Times.

Supply Constraints and Open Questions

The near-term ceiling on Google's chip push is supply, not design. One startup executive told Bloomberg anonymously that their company's use of TPUs has been limited by availability, complaining that Google had effectively given all its chips to Anthropic. Hassabis acknowledged the prioritization: "Mostly we're sort of favoring what supply we do have to the more elite teams who obviously are the ones that could maybe take the most advantage out of what the TPUs do best," he told Bloomberg.

Several questions remain open going into the conference:

Whether Google will formally announce the TPU v8 generation at Cloud Next this week, or hold those details for a later event
The specific pricing structure for Ironwood on Google Cloud versus renting equivalent Nvidia capacity
Whether the Marvell inference TPU talks produce a signed contract
How Google will balance TPU allocation between its own AI services, Anthropic, Meta, and smaller customers
Whether on-premises TPU deployments for enterprise customers, which Google is piloting with Anthropic, expand more broadly

The Bottom Line

Google is betting that the economics of running AI — not training it — will determine who wins the next decade of the chip market. A four-partner supply chain with explicitly split training and inference chips represents the most architecturally ambitious challenge to Nvidia's dominance from any hyperscaler. Nvidia still leads in training and retains structural advantages in the software ecosystem around CUDA. But in inference, the workload that scales with every AI query served to every user, Google has more volume than anyone. If Ironwood ships in the millions this year and Zebrafish ships in 2027 at MediaTek-level cost, Nvidia's share of inference compute at Google declines.

That is a problem Nvidia's $2 billion investment in Marvell is designed to blunt — but not solve.

Google's New TPU v8 Chips Take Direct Aim at Nvidia's Inference Empire

The Headline from Cloud Next

Sunfish and Zebrafish: Two Chips, One Strategy

The Four-Partner Supply Chain

Why Inference Is Where the Money Is Now

What This Means for Nvidia

Supply Constraints and Open Questions

AI-Generated Content

More from Sonarlink

OpenAI's $20B Cerebras Deal Aims to Break the Nvidia Stranglehold

NVIDIA Vera Rubin GPU 2026: Specs, Speed, and the $1T AI Bet

AI Agents in 2026: Best Agentic Workflow Tools for Enterprise