Nvidia Ada Lovelace architectural overview

Nvidia Ada Lovelace and GeForce RTX 40-Series: Everything We Know

Posted on


Nvidia’s Ada architecture and GeForce RTX 40-series graphics cards are slated to begin arriving on October 12, starting with the GeForce RTX 4090 and RTX 4080. That’s two years after the Nvidia Ampere architecture and basically right on schedule given the slowing down (or if you prefer, death) of Moore’s ‘Law,’ and it’s good news as the best graphics cards are in need of some new competition.

With the Nvidia hack earlier this year, we had a good amount of information on what to expect, and Nvidia has now confirmed most of the details on the first RTX 40-series cards. We’ve collected everything into this central hub detailing everything we know and expect from Nvidia’s Ada architecture and the RTX 40-series family.

There are still plenty of rumors swirling around, but we now have a much better idea of what to expect from the Ada Lovelace architecture. Nvidia detailed its data center Hopper H100 GPU, and much like with the Volta V100 and Ampere A100, the consumer products will have rather different configurations.

We know when the RTX 4090 will launch. If Nvidia follows a similar release schedule as in the past, we can expect the rest of the RTX 40-series to trickle out over the next year. RTX 4080 16GB and 12GB models will probably arrive in November, or perhaps late October, RTX 4070 will arrive in early 2023, and RTX 4060 and 4050 will come later next year. Let’s start with the high level overview of the specs and rumored specs for the Ada series of GPUs.

GeForce RTX 40-Series Specs and Speculation
Graphics Card RTX 4090 RTX 4080 16GB RTX 4080 12GB RTX 4070 RTX 4060 RTX 4050
Architecture AD102? AD103? AD104? AD104? AD106? AD107?
Process Technology TSMC 4N TSMC 4N TSMC 4N TSMC 4N TSMC 4N TSMC 4N
Transistors (Billion) 76 40? 32? 32? 20? 15?
Die size (mm^2) 629? 380? 300? 300? 225? 175?
SMs / CUs / Xe-Cores 128 76 60 48? 32? 24?
GPU Cores (Shaders) 16384 9728 7680 6144? 4096? 3072?
Tensor Cores 512 304 240 192? 128? 96?
Ray Tracing “Cores” 128 76 60 48? 32? 24?
Boost Clock (MHz) 2520 2510 2610 2600? 2600? 2600?
VRAM Speed (Gbps) 21 23 21 18? 18? 18?
VRAM (GB) 24 16 12 10? 8? 8?
VRAM Bus Width 384 256 192 160? 128? 64?
L2 Cache 96? 64? 48? 40? 32? 16?
ROPs 192? 112? 80? 64? 48? 32?
TMUs 512? 304? 240? 192? 128? 96?
TFLOPS FP32 (Boost) 82.6 48.8 40.1 31.9? 21.3? 16.0?
TFLOPS FP16 (FP8) 661 (1321) 391 (781) 321 (641) 256 (511)? 170 (341)? 128 (256)?
Bandwidth (GBps) 1008 736? 504? 360? 288? 144?
TDP (watts) 450 320 285 200? 160? 125?
Launch Date Oct 2022 Nov 2022? Nov 2022? Jan 2023? Apr 2023? Aug 2023?
Launch Price $1,599 $1,199 $899 $599? $449? $349?

First off, the first three cards are now official and the specs are reasonably accurate. There are a few remaining question marks, like the exact ROPs numbers and VRAM clocks, but they shouldn’t be too far off. The last three cards require some generous helpings of salt, as they’re more speculation than anything concrete.

We do know that Nvidia is hitting clock speeds of 2.5–2.6 GHz on the 4090 and 4080, and we expect similar clocks on the other GPUs in the RTX 40-series. We’ve put in tentative clock speed estimates of 2.6 GHz for now. Nvidia hasn’t specified precisely which GPUs are used on the various cards, or exact die sizes or transistor counts (except for “76 billion” on the RTX 4090).

Nvidia’s AD102 chip in all its glory (Image credit: Nvidia)

Nvidia will most likely use TSMC’s 4N process — “4nm Nvidia” — on all of the Ada GPUs, and definitely on the RTX 4090 and 4080 cards. Hopper H100 also uses TSMC’s 4N node, which mostly appears to be a tweaked variation on TSMC’s N5 node that’s been widely used in other chips and which will also be used AMD’s Zen 4 and RDNA 3. We don’t think Samsung will have a compelling alternative that wouldn’t require a serious redesign of the core architecture, so the whole family will likely be on the same node.

Nvidia will be “going big” with the AD102 GPU, and it’s closer in size and transistor counts to the H100 than GA102 was to GA100. Based on available information and a few remaining rumors, Ada Lovelace looks to be a monster. It will pack in far more SMs and the associated cores than the current Ampere GPUs, it will have much higher GPU clocks, and it will also contain a number of architectural enhancements to further boost performance. Nvidia claims that the RTX 4090 is 2x–4x faster than the outgoing RTX 3090 Ti, though caveats apply to those benchmarks.

The preview performance from Nvidia is primarily at 4K ultra, which is something to keep in mind. If you’re currently running a more modest processor rather than one of the absolute best CPUs for gaming, meaning the Core i9-12900K or Ryzen 7 5800X3D, you could very well end up CPU limited even at 1440p ultra. A larger system upgrade will likely be necessary to get the most out of the fastest Ada GPUs. 

Ada Will Massively Boost Compute Performance

(Image credit: Shutterstock)

With the high-level overview out of the way, let’s get into the specifics. The most noticeable change with Ada GPUs will be the number of SMs compared to the current Ampere generation. At the top, AD102 potentially packs 71% more SMs than the GA102. Even if nothing else were to significantly change in the architecture, we would expect that to deliver a huge increase in performance.

That will apply not just to graphics but to other elements as well. It doesn’t seem like most of the calculations have changed from Ampere, though the Tensor cores now support FP8 (with sparsity still) to potentially double the FP16 performance. The RTX 4090 has deep learning/AI compute of up to 661 teraflops in FP16, and 1,321 teraflops of FP8 — and a fully enabled AD102 chip could hit 1.4 petaflops at similar clocks.

The full GA102 in the RTX 3090 Ti by comparison tops out at around 321 TFLOPS FP16 (again, using Nvidia’s sparsity feature). That means RTX 4090 delivers a theoretical 107% increase, based on core counts and clock speeds. The same theoretical boost in performance should apply to shader and ray tracing hardware as well, except those are also changing.

The GPU shader cores will have a new Shader Execution Reordering (SER) feature that Nvidia claims will improve general performance by 25%, and can improve ray tracing operations by up to 200%.

The RT cores meanwhile have doubled down on ray/triangle intersection hardware, plus they have a couple more new tricks available. The Opacity Micromap (OMM) Engine enables significantly faster ray tracing for transparent surfaces like foliage, particles, and fences. The Displaced Micro-Mesh (DMM) Engine on the other hand optimizes the generation of the Bounding Volume Hierarchy (BVH) structure, and Nvidia claims it can create the BVH up to 10x faster while using 20x less (5%) memory for BVH storage.

Together, these architectural enhancements should enable Ada Lovelace GPUs to offer a massive generational leap in performance.

Ada Lovelace ROPs



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *