NVIDIA Announces Blackwell GPUs with 208 Billion Transistors, including GB200 System Supporting 72 Blackwell GPUs and 13.5 TB of HBM3e Memory

The FPS Review may receive a commission if you purchase something after clicking a link in this article.

Image: NVIDIA

NVIDIA has officially launched its next-generation Blackwell platform, and with it comes a set of new and elaborate hardware for fueling the next stage in the AI craze, including some that the company says are powerful enough to enable “trillion-parameter-scale AI models.” These would include the GB200 NVL72, a new exascale computer that can deliver up to 1,440 PFLOPS and 3,240 TFLOPS of performance thanks in part to its 70+ Blackwell GPUs—new GPUs based on TSMC’s 4NP process that feature 208 billion transistors.

Blackwell announcements include:

New renders of Blackwell-powered hardware:

NVIDIA on how the Blackwell platform comprises six technologies:

  • World’s Most Powerful Chip — Packed with 208 billion transistors, Blackwell-architecture GPUs are manufactured using a custom-built 4NP TSMC process with two-reticle limit GPU dies connected by 10 TB/second chip-to-chip link into a single, unified GPU.
  • Second-Generation Transformer Engine — Fueled by new micro-tensor scaling support and NVIDIA’s advanced dynamic range management algorithms integrated into NVIDIA TensorRT-LLM and NeMo Megatron frameworks, Blackwell will support double the compute and model sizes with new 4-bit floating point AI inference capabilities.
  • Fifth-Generation NVLink — To accelerate performance for multitrillion-parameter and mixture-of-experts AI models, the latest iteration of NVIDIA NVLink delivers groundbreaking 1.8TB/s bidirectional throughput per GPU, ensuring seamless high-speed communication among up to 576 GPUs for the most complex LLMs.
  • RAS Engine — Blackwell-powered GPUs include a dedicated engine for reliability, availability and serviceability. Additionally, the Blackwell architecture adds capabilities at the chip level to utilize AI-based preventative maintenance to run diagnostics and forecast reliability issues. This maximizes system uptime and improves resiliency for massive-scale AI deployments to run uninterrupted for weeks or even months at a time and to reduce operating costs.
  • Secure AI — Advanced confidential computing capabilities protect AI models and customer data without compromising performance, with support for new native interface encryption protocols, which are critical for privacy-sensitive industries like healthcare and financial services.
  • Decompression Engine — A dedicated decompression engine supports the latest formats, accelerating database queries to deliver the highest performance in data analytics and data science. In the coming years, data processing, on which companies spend tens of billions of dollars annually, will be increasingly GPU-accelerated.

HGX B200/B100 specs:

 HGX B200HGX B100
GPUsHGX B200 8-GPUHGX B100 8-GPU
Form factor8x NVIDIA B200 SXM8x NVIDIA B100 SXM
HPC and AI compute (FP64/TF32/FP16/FP8/FP4)40TF/18PF/36PF/72PF/144PF30TF/14PF/28PF/56PF/112PF
MemoryUp to 1.4TBUp to 1.4TB
NVIDIA NVLinkFifth generationFifth generation
NVIDIA NVSwitchFourth generationFourth generation
NVSwitch GPU-to-GPU bandwidth1.8TB/s1.8TB/s
Total aggregate bandwidth14.4TB/s14.4TB/s

GB200 Series specs:

 GB200 NVL72GB200 Grace Blackwell Superchip
Configuration36 Grace CPU : 72 Blackwell GPUs1 Grace CPU : 2 Blackwell GPU
FP4 Tensor Core1,440 PFLOPS40 PFLOPS
FP8/FP6 Tensor Core720 PFLOPS20 PFLOPS
INT8 Tensor Core720 POPS20 POPS
FP16/BF16 Tensor Core360 PFLOPS10 PFLOPS
TF32 Tensor Core180 PFLOPS5 PFLOPS
FP64 Tensor Core3,240 TFLOPS90 TFLOPS
GPU Memory | BandwidthUp to 13.5 TB HBM3e | 576 TB/sUp to 384 GB HBM3e | 16 TB/s
NVLink Bandwidth130TB/s3.6TB/s
CPU Core Count2,592 Arm Neoverse V2 cores72 Arm Neoverse V2 cores
CPU Memory | BandwidthUp to 17 TB LPDDR5X | Up to 18.4 TB/sUp to 480GB LPDDR5X | Up to 512 GB/s

NVIDIA on the GB200 NVL72:

It combines 36 Grace Blackwell Superchips, which include 72 Blackwell GPUs and 36 Grace CPUs interconnected by fifth-generation NVLink. Additionally, GB200 NVL72 includes NVIDIA BlueField-3 data processing units to enable cloud network acceleration, composable storage, zero-trust security and GPU compute elasticity in hyperscale AI clouds. The GB200 NVL72 provides up to a 30x performance increase compared to the same number of NVIDIA H100 Tensor Core GPUs for LLM inference workloads, and reduces cost and energy consumption by up to 25x.

Jensen Huang, founder and CEO of NVIDIA said:

For three decades we’ve pursued accelerated computing, with the goal of enabling transformative breakthroughs like deep learning and AI. Generative AI is the defining technology of our time. Blackwell is the engine to power this new industrial revolution. Working with the most dynamic companies in the world, we will realize the promise of AI for every industry.

Source

Join the discussion in our forums...

Tsing Mui
News poster at The FPS Review.

Recent News