The NVIDIA Lovelace Architecture is what powers the GeForce RTX 40 Series graphics cards. It all starts with the new manufacturing process, GeForce RTX 40 series GPUs are manufactured on a new custom TSMC and NVIDIA-made 4N process, allowing 76.3 Billion transistors and a die size smaller than the previous generation with AD102, powering the GeForce RTX 4090. The full AD102 includes 12 Graphics Processing Clusters (GPCs), 72 Texture Processing Clusters (TPCs), 144 Streaming Multiprocessors (SMs) and 18,432 CUDA Cores and 144 RT Cores, and 576 Tensor Cores and Texture Units. Note that this is not the RTX 4090 specs, it has one GPC disabled, we’ll go over the RTX 4090 specifically on the next page.
The GPC includes a dedicated raster engine, two raster operations partitions, eight individual ROP units, and six TPCs which include one PolyMorph engine and two SMs. Each SM contains 128 CUDA Cores, one third-generation RT Core and four fourth-generation Tensor Cores, four Texture Units, a 256KB register file, and 128KB of L1 cache. L1 cache, L2 cache, and register file sizes have all been increased in Ada architecture. The full AD102 spec has each Ada SM having 128KB of L1 cache, a total of 18,432KB of L1 cache, and the L2 cache has been increased to 98,304KB, which is an improvement of 16x over GA102.