Nvidia RTX 40-series laptops

Nvidia’s RTX 40-Series Laptops Don’t Bode Well for RTX 4060, 4050 Desktop GPUs

Posted on

Nvidia’s Ada Lovelace architecture ushers in a new level of performance at the top of the stack, with the RTX 4090 besting the previous generation RTX 3090 Ti by 52% on average in our rasterization benchmarks, and 70% in ray tracing benchmarks — both at 4K, naturally. The 4090 now sits comfortably atop our GPU benchmarks hierarchy and ranks as one of the best graphics cards around, at least if you have deep pockets.

Unfortunately, the step down from the 4090 to the RTX 4080 is rather precipitous, dropping performance by 23% for rasterization and 30% for ray tracing. Dropping down another level to the new RTX 4070 Ti knocks an additional 22% off the performance relative to the 4080. If you’re keeping track — and we definitely like to keep score — that means the third-string Ada card sporting the AD104 GPU is slower than the previous generation 3090 Ti, nevermind Nvidia’s claims to the contrary that rely on benchmarks using DLSS 3’s Frame Generation.

Perhaps more alarming with the RTX 4070 Ti is that it only has a 192-bit memory interface. It still has 12GB of GDDR6X memory, and the large L2 cache in general means that the narrower bus isn’t a deal killer, but things don’t look so good as we eye future lower-tier RTX 40-series parts like the 4060 and 4050.

Nvidia recently announced the full line of RTX 40-series laptop GPUs, ranging from the RTX 4090 mobile that uses the AD103 GPU (basically a mobile 4080) down to the anemic-sounding RTX 4050. Here’s the full list of specs for the mobile parts.

Swipe to scroll horizontally
Nvidia Ada Laptop GPU Specifications
Graphics Card RTX 4090 for Laptops RTX 4080 for Laptops RTX 4070 for Laptops RTX 4060 for Laptops RTX 4050 for Laptops
Architecture AD103 AD104 AD106? AD106? AD107?
Process Technology TSMC 4N TSMC 4N TSMC 4N TSMC 4N TSMC 4N
Transistors (Billion) 45.9 35.8 ? ? ?
Die size (mm^2) 378.6 294.5 ? ? ?
SMs 76 58 36 24 20
GPU Shaders 9728 7424 4608 3072 2560
Tensor Cores 304 232 144 96 80
Ray Tracing “Cores” 76 58 36 24 20
Boost Clock (MHz) 1455-2040 1350-2280 1230-2175 1470-2370 1605-2370
VRAM Speed (Gbps) 18? 18? 18? 18? 18?
VRAM (GB) 16 12 8 8 6
VRAM Bus Width 256 192 128 128 96
L2 Cache 64 48 32 32 24
ROPs 112 80 48 32 32
TMUs 304 232 144 96 80
TFLOPS FP32 (Boost) 28.3-39.7 20.0-33.9 11.3-20.0 9.0-14.6 8.2-12.1
TFLOPS FP16 (FP8) 226-318 (453-635) 160-271 (321-542) 91-160 (181-321) 72-116 (145-233) 66-97 (131-194)
Bandwidth (GBps) 576 432 288 288 216
TDP (watts) 80-150 60-150 35-115 35-115 35-115

It’s a reasonably safe bet that the desktop RTX 4070 will use the same AD104 as the RTX 4070 Ti, just with fewer SMs and shaders. Desktop RTX 4060 Ti, assuming we get that anytime soon, may or may not use AD104; the only other option would presumably be the AD106 GPU used in the mobile 4070/4060. And that’s a problem.

The previous generation RTX 3060 Ti came with 8GB of GDDR6 on a 256-bit interface. We weren’t particularly pleased with the lack of VRAM, especially when AMD started shipping RX 6700 XT (and later 6750 XT) with 12GB VRAM. Nvidia basically did a course correction with the RTX 3060 and gave it 12GB VRAM, making it a nice step up from the previous RTX 2060 — and even the 2060 eventually saw 12GB models, though prices made them mostly unattractive.

Now we’re talking about RTX 4060 most likely going back to 8GB, and that would suck. There are plenty of games now that can exceed 8GB of VRAM use, and that number will only grow in the next two years. But Nvidia doesn’t have many other options, since GDDR6 and GDDR6X memory capacities top out at 2GB per 32-bit channel.

There’s potential to do “clamshell” mode with two memory chips per channel, one on each side of the PCB, but that’s pretty messy and not something we’d expect to see in a mainstream GPU. That could get the 128-bit interface up to 16GB of VRAM, which again would be odd as the higher-tier parts like the 4070 Ti only have 12GB. Still, that sounds better than an RTX 4060 8GB model to me!

And what about the RTX 4050? Maybe Nvidia will stick with the 128-bit interface on the AD106 GPU and just skip using AD107 on a desktop part — that’s basically what happened with GA107 that was almost exclusively used for laptop RTX 3050. But if it does try to use AD107 in a desktop, it would only have up to 6GB VRAM, again with clamshell VRAM being a potential out.

The same AD104 GPU sits inside Nvidia’s RTX 4070 Ti desktop card as well as the upcoming RTX 4080 laptop solution. (Image credit: Tom’s Hardware)

It’s not just the memory capacities that raise some concern. We said in the RTX 4070 Ti review that performance wasn’t bad, but it also wasn’t amazing. It’s basically a cheaper take on an RTX 3090, with half the VRAM and lower power use. The 4070 Ti gets by with 60 Streaming Multiprocessors (SMs) and 7680 CUDA cores (GPU shaders), slightly more than the outgoing RTX 3070 Ti. But AD106 could top out at just 40 SMs, maybe even 36 SMs, which would put it in similar territory to the RTX 3060 Ti on core counts, leaving only GPU clocks as a performance boost.

Put those two things together — insufficient VRAM and relatively minor increases in GPU shader counts — and we’re likely looking at modest performance improvements compared to the previous Ampere generation GPUs.

Nvidia will then trot out DLSS 3 performance improvements, which only apply to a subset of games and also don’t offer true performance increases, and things start to sound even worse. Part of the benefit of having a GPU that can run games at 120 fps today is that, as games get more demanding, it will still be able to do 60 fps in most games a few years down the road. But what happens when those aren’t real framerates?

Nvidia driver DLSS3 games

(Image credit: Nvidia)

Let’s assume a game running at 120 fps courtesy of DLSS 3’s Frame Generation technology, with a base performance of 70 fps. All is well and good for now, but down the road the base performance will drop below 40 fps as games become more demanding, and eventually it will fall below 30 fps. What we’ve experienced is that Frame Generation with a base framerate of less than 30 fps still feels like sub-30 fps, even if the monitor is getting twice as many frame updates per second.

That same logic applies to higher framerates as well, so DLSS 3 at 120 fps with a 70 fps base will still feel like 70 fps, even if it looks a bit smoother to the eye. Most people won’t be able to tell the difference between input rates at 70 samples per second and inputs at 120 samples per second. But when you start to fall below 40, even non-professional gamers will start to feel the difference.

Or to put it more bluntly: DLSS 3 and Frame Generation are no panacea. They can help smooth out the visuals and maybe improve the feel of games a bit, but the benefit isn’t going to be as noticeable as actual fully rendered frames with new user input factored in, particularly as performance drops below 60 fps.

That’s not to say it’s a bad technology — it’s quite clever actually — and we don’t mind that it exists. But Nvidia needs to stop comparing DLSS 3 scores against non-DLSS 3 results and acting like they’re the same thing. Take the base framerate before Frame Generation and add maybe 10–20 percent and that’s what a game feels like, not the 60–100 percent higher fps that benchmarks will show.

Back to the topic at hand, the future mainstream and budget RTX 40-series GPUs will no doubt beat the existing models in pure performance, and they’ll offer DLSS 3 support as well. Hopefully Nvidia will return to prices closer to the previous generation, though, because if the RTX 4060 costs $499 and the RTX 4050 costs $399, they’re going to end up being minor upgrades compared to the existing cards at those price points.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *