The next RTX 40 is twice as fast as the RTX 30?
This is a rumor today about upcoming Nvidia cards. These new leaks come from kopte7kimi And talking about the schematic diagram of the structure of the new generation of greens. Picture of a file block diagram AD102 “Ada Lovelace” GPU It will allow us to drop ourselves on the performance of the upcoming RTX 40.
RTX 40: A great spec sheet (if true)
For starters, the GPU Ada Lovelace AD102 It will house up to 12 GPCs (graphics processing clusters). This is a 70% increase from in GA102 (larger than current range) which contains only 7 GPC. Each GPU will consist of 6 TPCs and 2 SMs, which is the same configuration as the current chip. Each SM (Multiprocessor Stream) will have four sub-cores, which is also the same as the GA102 GPU. The real change is the FP32 and INT32 kernel configuration. Each sub center will consist of 128 FP32 units, but combined FP32 + INT32 units will increase to 192 units. This is because the FP32 modules do not share the same subcentre as the IN32 modules. 128 FP32 cores are separated from 64 INT32 cores.
The cache should be another area where NVIDIA has outgrown existing Ampere GPUs. Ada Lovelace GPUs will have 192KB of L1 cache per SM, a 50% increase over the Ampere. This amounts to a total of 4.5MB of L1 cache on the top-of-the-line AD102 GPU. The L2 cache will be increased to 96MB, a number that is regularly mentioned in several leaks. This is nearly 16 times more compared to the Ampere GPU which only hosts 6MB of L2 cache. The cache will be shared on the GPU.
If the leaks are correct, we have an exponential increase in L2 cache, which increases to a total 96 MB to me’ M 102 . Regarding ROPs, there could have been twice as many modules in this architecture, 32 from GPC To be exact, giving us a total 384 OMR For a potential RTX 4090 versus 112 for an RTX 3090… on paper it’s brutal.
But after this orgy of technical data, what gains can we really expect?
Obviously it’s still too early to get an exact idea but if these are confirmed, the technical sheet shows a huge difference compared to the Ampere. To summarize:
- X2 GPC (compared to amps)
- 50% more cores (compared to amps)
- 50% more L1 cache (compared to amps)
- 16 times the L2 cache (compared to amps)
- X2 ROP (compared to amps)
- 4th generation motor and 3 cores RT
But what can we expect in terms of actual performance?
It’s very difficult because we’re missing a key piece of data: the operating frequency.
If we speculate a bit about it, we can introduce ourselves to a strength in FP32 from 90 TFLOPS, more than twice that of the current GA102. But with TFLOPS we can also have surprises. If they give an idea of raw performance, they will never allow prejudgement of results in ‘everyday’ use. Leaked ads from x2 to x2.2 compared to the RTX 30… There will obviously be gains, and they seem big. But to make a decision next, we will have to wait a little longer.