The NVIDIA GeForce RTX 40 Series is here, and today we have a full review of the GeForce RTX 4090 Founders Edition video card. The GeForce RTX 40 Series ushers in NVIDIA’s new Ada Lovelace architecture, and a new suite of high-end GPUs in the GeForce RTX 4090 and RTX 4080-class of video cards. Today we will focus on one, the GeForce RTX 4090, specifically NVIDIA’s Founders Edition of this GPU.
On September 20th, 2022 NVIDIA announced the GeForce RTX 40 Series of graphics cards to the world in a live stream. These new GPUs will be powered by NVIDIA’s 3rd Gen RTX architecture and introduce some new features and technology. We will give you a high-level brief of the technology, and a full performance review of the GeForce RTX 4090 Founders Edition video card.
Three new video cards were announced, the GeForce RTX 4090 which will have an MSRP of $1,599, the GeForce RTX 4080 16GB which will have an MSRP of $1,199 and the GeForce RTX 4080 12GB which will have an MSRP of $899. The GeForce RTX 4090 will be available on October 12th and the two RTX 4080s in November. Some buzzwords you will hear in relation to this new NVIDIA Ada architecture is 3rd Generation RT Cores, 4th Generation Tensor Cores, DLSS 3, NVIDIA RTX Remix, NVIDIA Omniverse, Shader Execution Reordering, AV1 dual encoders, NVIDIA Reflex, NVIDIA Broadcast, NVIDIA Studio, and Optical Flow Accelerators and Frame Generation.
NVIDIA Ada Lovelace Architecture
The NVIDIA Lovelace Architecture is what powers the GeForce RTX 40 Series graphics cards. It all starts with the new manufacturing process, GeForce RTX 40 series GPUs are manufactured on a new custom TSMC and NVIDIA-made 4N process, allowing 76.3 Billion transistors and a die size smaller than the previous generation with AD102, powering the GeForce RTX 4090. The full AD102 includes 12 Graphics Processing Clusters (GPCs), 72 Texture Processing Clusters (TPCs), 144 Streaming Multiprocessors (SMs) and 18,432 CUDA Cores and 144 RT Cores, and 576 Tensor Cores and Texture Units. Note that this is not the RTX 4090 specs, it has one GPC disabled, we’ll go over the RTX 4090 specifically on the next page.
The GPC includes a dedicated raster engine, two raster operations partitions, eight individual ROP units, and six TPCs which include one PolyMorph engine and two SMs. Each SM contains 128 CUDA Cores, one third-generation RT Core and four fourth-generation Tensor Cores, four Texture Units, a 256KB register file, and 128KB of L1 cache. L1 cache, L2 cache, and register file sizes have all been increased in Ada architecture. The full AD102 spec has each Ada SM having 128KB of L1 cache, a total of 18,432KB of L1 cache, and the L2 cache has been increased to 98,304KB, which is an improvement of 16x over GA102.
RT Cores (3rd Generation)
The new 3rd Generation RT Core in Ada architecture adds new functionality, Opacity Micromap Engine, and Displaced Micro-Mesh Engine on top of previous features in the last two generations. Opacity Micromap Engine evaluates Opacity Micromaps which are used to accelerate alpha traversal. Displaced Miro-Mesh Engine generates meshes of micro-triangles to ray trace geometrically complex objects and environments with less BVH build time and storage costs. Also, ray-triangle intersection testing is two times faster this generation.
Shader Execution Reordering (SER)
Another new technology inherent to the NVIDIA Ada architecture is the introduction of Shader Execution Reordering (SER) technology, this is akin to Out of Order Execution in CPUs. SER enhances the efficiency of RT shader execution by reordering shader work on-the-fly for better execution and data locality. These optimizations target the SM and memory system to help in efficient thread reordering. SER can often provide up to a 2x performance improvement for RT shaders with high levels of divergence.
SER is controlled by the application through a small API that developers can apply based on the workload. Game developers actually have to program for and utilize the feature in their games in regards to RT for it to actually improve performance, it is not just an automatic hardware thing. Right now, no games support it, and the first will be a new Cyberpunk 2077 patch that introduces a more intense Ray Tracing mode. NVIDIA is working with Microsoft to extend the graphics standard APIs with SER. SER could be a big improvement for Ray Tracing, but we’ll have to see games actually use it.
Tensor Cores (4th Generation) and DLSS 3
The new 4th Generation Tensor Cores in Ada architecture adds new functionality, primarily DLSS 3 and the Hopper FP8 Transformer Engine. The better way to describe this is basically to say that DLSS 3 has a feature that can enable Frame Generation by creating synthesized frames between existing frames to improve frame rate and provide a smoother experience. Instead of just upscaling, which is what DLSS is, DLSS 3 adds this Frame Generation technique by creating entirely new frames that didn’t exist before, bypassing the rendering and CPU pipeline. It does this by a method known as Optical Flow Acceleration using the Tensor Cores.
This is very important to know, the Optical Flow Engine (OFA) was present in the last generation, Ampere, which is not what is new here. The ability to do Optical Flow Estimation was already present in the last generation hardware’s Tensor Cores. What’s new here, is the fact that Ada Lovelace is simply much faster at it, and more accurate. It is this performance boost of the Optical Flow Acceleration that is allowing Ada Lovelace to do Frame Generation. Technically, it could be done in the previous generation, it just might not lead to an actual improvement in the gameplay experience and bad image quality.
NVIDIA claims that DLSS 3 Frame Generation can improve performance by an additional 2x over DLSS 2. DLSS 3 Frame Generation can also improve performance in cases where the GPU is bottlenecked by the CPU. This is because it bypasses the rendering engine, and CPU, and instead generates and inserts new frames. This can improve framerate but to the detriment of increasing latency. To counteract this latency addition, NVIDIA utilizes its Reflex technology to help reduce the added latency. NVIDIA Reflex basically removes the render queue from the pipeline, helping to reduce latency. Therefore, NVIDIA Reflex is a necessity when it comes to DLSS 3 frame generation else the latency would be noticeable.
If a game already has DLSS 2 Super Resolution, upgrading to DLSS 3 is a very simple process and will make both Super Resolution and Frame Generation available. DLSS 3 leverages the same integration points as DLSS 2 (color buffer, depth buffer, engine motion vectors, and output buffers) and NVIDIA Reflex, making upgrades from these existing SDKs easy via our DLSS 3 Streamline plugin. DLSS 3 is also coming to the world’s most popular game engines, including Unity, Unreal Engine, and Frostbite Engine, making it simple for games based on these engines to flip DLSS 3 on.-NVIDIA
DLSS 3 only works with DX12 and the GeForce RTX 40 series GPUs. Also, you do need to have the Hardware Accelerated GPU Scheduling feature in Windows 10 and 11 enabled, which it should be by default. Here are some dates of games that will be publically available with DLSS 3 soon.
- SUPER PEOPLE: Early Access Available at 10 PM PT tonight with DLSS 3
- Loopmancer: Updates with DLSS 3 on October 12
- Justice ‘Fuyun Court’: New Graphics Showcase Available On October 12 with DLSS 3
- Microsoft Flight Simulator: Launches in beta for Xbox Insider program members on October 17 with DLSS 3
- A Plague Tale: Requiem: Launches October 18 with DLSS 3
DLSS 3 has support from many game developers with more than 35 games and applications announcing support including:
A Plague Tale: Requiem, Atomic Heart, Black Myth: Wukong, Bright Memory: Infinite, Chernobylite, Conqueror’s Blade, Cyberpunk 2077, Dakar Desert Rally, Deliver Us Mars, Destroy All Humans! 2 – Reprobed, Dying Light 2 Stay Human, F1 22, F.I.S.T.: Forged In Shadow Torch, Frostbite Engine, HITMAN 3, Hogwarts Legacy, ICARUS, Jurassic World Evolution 2, Justice, Loopmancer, Marauders, Marvel’s Spider-Man Remastered, Microsoft Flight Simulator, Midnight Ghost Hunt, Mount & Blade II: Bannerlord, Naraka: Bladepoint, NVIDIA Omniverse, NVIDIA Racer RTX, PERISH, Portal With RTX, Ripout, S.T.A.L.K.E.R 2: Heart of Chornobyl, Scathe, SUPER PEOPLE, Sword and Fairy 7, SYNCED, The Lord of the Rings: Gollum, The Witcher 3: Wild Hunt, THRONE AND LIBERTY, Tower of Fantasy, Unity, Unreal Engine 4 & 5, Warhammer 40,000: Darktide
Dual AV1 Encoders
An exciting new feature of the Ada Lovelace architecture is the inclusion of dual 8th-generation dedicated hardware video encoders. The new 8th-generation NVENC video encoders now support AV1 decoding and encoding. NVIDIA didn’t just put up with a single one either, they put dual encoders on this architecture, two of them. This will take streaming and video content to a new level. RTX 40 series GPUs with 12GB of memory or more will have these dual NVENC encoders. It will enable video encoding at 8K/60 for professional video editing or four 4K/60. Ada GPUs also have the 5th generation hardware decoder known as NVDEC. It supports hardware-accelerated video decoding of MPEG-2, VC-1, H.264, H.265, VP8, VP9, and AV1.