Image: NVIDIA

Is it hot in here, or is it just me? NVIDIA has announced its first GPU based on the new Hopper graphics architecture, the H100, and its specifications sheet is absolutely wild. In addition to 30/60 teraFLOPS of FP32/FP62 performance and support for the latest technologies such as HBM3 and PCI Gen 5, the GPU, as part of the dual-slot, NVLink-based SXM configuration, will draw a pretty incredible 700 watts of power. The previous generation’s Ampere-based A100 40GB SXM and A100 80GB SXM have 400 W listed as their TDPs. Built on TSMC’s N4 processor, the NVIDIA H100 will be available in Q3 2022.

Product Specifications

Form FactorH100 SXMH100 PCle
FP6430 teraFLOPS24 teraFLOPS
FP64 Tensor Core60 teraFLOPS48 teraFLOPS
FP3260 teraFLOPS48 teraFLOPS
TF32 Tensor Core1,000 teraFLOPS* | 500 teraFLOPS800 teraFLOPS* | 400 teraFLOPS
BFLOAT16 Tensor Core2,000 teraFLOPS* | 1,000 teraFLOPS1,600 teraFLOPS* | 800 teraFLOPS
FP16 Tensor Core2,000 teraFLOPS* | 1,000 teraFLOPS1,600 teraFLOPS* | 800 teraFLOPS
FP8 Tensor Core4,000 teraFLOPS* | 2,000 teraFLOPS3,200 teraFLOPS* | 1,600 teraFLOPS
INT8 Tensor Core4,000 TOPS* | 2,000 TOPS3,200 TOPS* | 1,600 TOPS
GPU memory80GB80GB
GPU memory bandwidth3TB/s2TB/s
Decoders7 NVDEC 7 JPEG7 NVDEC 7 JPEG
Max thermal design power (TDP)700W350W
Multi-Instance GPUsUp to 7 MIGS@ 10GB each
Form factorSXMPCle
InterconnectNVLink: 900GB/s PCle Gen5: 128GB/sNVLINK: 600GB/s PCle Gen5: 128GB/s

NVIDIA Announces Hopper Architecture, the Next Generation of Accelerated Computing (NVIDIA)

World’s Most Advanced Chip — Built with 80 billion transistors using a cutting-edge TSMC 4N process designed for NVIDIA’s accelerated compute needs, H100 features major advances to accelerate AI, HPC, memory bandwidth, interconnect and communication, including nearly 5 terabytes per second of external connectivity. H100 is the first GPU to support PCIe Gen5 and the first to utilize HBM3, enabling 3TB/s of memory bandwidth. Twenty H100 GPUs can sustain the equivalent of the entire world’s internet traffic, making it possible for customers to deliver advanced recommender systems and large language models running inference on data in real time.

New Transformer Engine — Now the standard model choice for natural language processing, the Transformer is one of the most important deep learning models ever invented. The H100 accelerator’s Transformer Engine is built to speed up these networks as much as 6x versus the previous generation without losing accuracy.

2nd-Generation Secure Multi-Instance GPU — MIG technology allows a single GPU to be partitioned into seven smaller, fully isolated instances to handle different types of jobs. The Hopper architecture extends MIG capabilities by up to 7x over the previous generation by offering secure multitenant configurations in cloud environments across each GPU instance.

Confidential Computing — H100 is the world’s first accelerator with confidential computing capabilities to protect AI models and customer data while they are being processed. Customers can also apply confidential computing to federated learning for privacy-sensitive industries like healthcare and financial services, as well as on shared cloud infrastructures.

4th-Generation NVIDIA NVLink — To accelerate the largest AI models, NVLink combines with a new external NVLink Switch to extend NVLink as a scale-up network beyond the server, connecting up to 256 H100 GPUs at 9x higher bandwidth versus the previous generation using NVIDIA HDR Quantum InfiniBand.

DPX Instructions — New DPX instructions accelerate dynamic programming — used in a broad range of algorithms, including route optimization and genomics — by up to 40x compared with CPUs and up to 7x compared with previous-generation GPUs. This includes the Floyd-Warshall algorithm to find optimal routes for autonomous robot fleets in dynamic warehouse environments, and the Smith-Waterman algorithm used in sequence alignment for DNA and protein classification and folding.

Go to thread

Don’t Miss Out on More FPS Review Content!

Our weekly newsletter includes a recap of our reviews and a run down of the most popular tech news that we published.

8 comments

  1. Yeah this is not meant for home use. And $12k is generous. lol
    12 K is for the cheap version. I looked at building out some vmware hosts with these and they were recommending 4 cards per host for like 60k+ per host. It was bonkers. Makes the laptops we would have seem cheap... if you figure 9 hosts... at 60k per.. is that cheaper than 2000 laptops? hummm... (not entire company just a subset.) 560k just to give accerlated VDI's. Or.. 1200 per for 2000 laptops. Yea cheaper to do the hosts... lol. But it FEELS more expensive.
  2. 12 K is for the cheap version. I looked at building out some vmware hosts with these and they were recommending 4 cards per host for like 60k+ per host. It was bonkers. Makes the laptops we would have seem cheap... if you figure 9 hosts... at 60k per.. is that cheaper than 2000 laptops? hummm... (not entire company just a subset.) 560k just to give accerlated VDI's. Or.. 1200 per for 2000 laptops. Yea cheaper to do the hosts... lol. But it FEELS more expensive.

    Unless I had a team of engineers that needed that kind of processing power I can't see using these for VDI. We have a stack of Tesla cards in a few hosts designated for processing super high resolution medical imaging systems. I want to say they were something like $25k each.

Leave a comment

Please log in to your forum account to comment