Image: AMD

3DCenter.org has shared a rumor suggesting that AMD’s Radeon RX 6000 Series successors could feature a multi-chip-module (MCM) design, which is similar to what NVIDIA is reportedly planning for its “Hopper” family of next-generation graphics cards. The speculation stems from Kepler_L2, who claims that red team has had a working version of Navi 31 since early 2020. He also revealed that the top SKU, which may very well turn out to be the Radeon RX 7900 XT, boasts a pair of 80 CU chiplets for a potential total of 10,240 Stream Processors—5,120 more than the current Radeon RX 6900 XT flagship.

“Navi31 working silicon exists since early 2020,” Kepler_L2 tweeted. “Nothing I can confirm 100% now, but from what I know Navi 31 is a 80 CU chiplet and top SKU has 2 of them.”

Another enthusiast has shared patents that provide additional insight into what AMD might be planning for its next-generation Radeon GPUs. One patent that relates to synchronizing workloads seems to confirm that red team is making headway on developing MCM-based GPUs, while a second suggests that upcoming Radeon cards could feature considerable performance improvements in the ray-tracing department.

“Described herein are techniques for performing ray tracing operations,” an abstract reads, hinting at new ray tracing techniques that may debut as part of AMD’s upcoming RDNA 3 graphics architecture. “A command processor executes custom instructions for orchestrating a ray tracing pipeline. The custom instructions cause the command processor to perform a series of loop iterations, each at a particular recursion depth.”

Join the Conversation

17 Comments

  1. Yea they need better market penetration of the current cards before releasing these new chiplet cards. Though really if they can produce those faster… as long as the drivers are not too divergent it shouldn’t be an issue.
  2. I could be interested in this, but only if they are doing it chioket style and have found a way to make all of those CU’s appear to the operating system as if they are on the same die.

    I am done with Crossfire and SLI implementations, even if they are on one board.

  3. I could be interested in this, but only if they are doing it chioket style and have found a way to make all of those CU’s appear to the operating system as if they are on the same die.

    I am done with Crossfire and SLI implementations, even if they are on one board.

    I don’t see why they couldn’t have the OS see it as one die. Their CPU’s are seen as one die. The controller is what the OS sees.

  4. I don’t see why they couldn’t have the OS see it as one die. Their CPU’s are seen as one die. The controller is what the OS sees.

    The OS sees chiplets and so on; this was one of the issues that Microsoft and the Linux kernel developers had to solve before Zen could really stretch its legs.

    Beyond that, exposing the configuration through the driver can easily have benefits for software tuning. Perhaps it runs great untuned, and can be made to sing if the application is aware of the layout?

  5. The OS sees chiplets and so on; this was one of the issues that Microsoft and the Linux kernel developers had to solve before Zen could really stretch its legs.

    Beyond that, exposing the configuration through the driver can easily have benefits for software tuning. Perhaps it runs great untuned, and can be made to sing if the application is aware of the layout?

    The OS sees sockets and cores. That information is being provided by the BIOS and CPU. What the CPU presents as far as cores has nothing to do with how many chiplets are on a substrate. The controller presents it as a single CPU with X number of cores.

  6. The controller presents it as a single CPU with X number of cores.

    The OS definitely sees separate CCXs. Again, this was a big problem with Zen, particularly with the latency difference between the caches. Zen 3 addresses that a bit by simply upping the size of the outermost cache.

  7. I am done with Crossfire and SLI implementations, even if they are on one board.

    The last one I ever f*cked with was the GTX 690, with two 680 GPUs on one board. That thing was… temperamental. Multi-monitor output was kind of a pain in the @ss to deal with too.

  8. The io die will be the infinity fabric itself, it will be basically a 2 story chip. That would be my guess anyway
  9. If they’re smaller and easier to fab, increasing yields?

    That’s the hope!

    smaller chiplets results in increased yields. At least that’s what I’ve been told.

    While in theory I agree, they talk about 2 times the amount of stream processors, so that would be 2 of the current high end chips, even if fabbed on a smaller node that might improve yields, that’s no guarantee they can pump those out, maybe the lesser models but why make the effort there if you have to make the more complicated ones anyways.

  10. While in theory I agree, they talk about 2 times the amount of stream processors, so that would be 2 of the current high end chips, even if fabbed on a smaller node that might improve yields, that’s no guarantee they can pump those out, maybe the lesser models but why make the effort there if you have to make the more complicated ones anyways.

    My guess would be multiple chiplets of smaller amounts of stream processors. That way the product stack can be scaled accordingly. Which would mean higher yields.

  11. Looking at 6800/6900 power levels – AMD may need a good jump in efficiency before they can get to 10k cores. The 6900 is already at 300W with just half that number.

    I mean, sure you could throw that many cores and kneecap the TDP so it fits in a PCI slot form factor, but the 6900 is already doing that. The 6800 have a higher power budget per core than 6900 already – and that kinda plays out in the performance delta.

  12. Looking at 6800/6900 power levels – AMD may need a good jump in efficiency before they can get to 10k cores. The 6900 is already at 300W with just half that number.

    I mean, sure you could throw that many cores and kneecap the TDP so it fits in a PCI slot form factor, but the 6900 is already doing that. The 6800 have a higher power budget per core than 6900 already – and that kinda plays out in the performance delta.

    That is certainly an interesting point. I haven’t really looked at AMDs power draw this gen, since they’ve also kneecapped RT and haven’t shown much initiative in terms of software support either yet.

    But even if they keep it from overheating, which is certainly possible in a PCIe form-factor even if drawing beyond say 500W, they run into so very many problems in terms of actually supporting the product. Chief among them being such a product would have a market limited by the extremes they’d need to go to in order to keep it cool!

    My guess would be multiple chiplets of smaller amounts of stream processors. That way the product stack can be scaled accordingly. Which would mean higher yields.

    Even as everyone is more or less expecting this to be the route they take, one important thing to keep in mind is that they’re likely going to need an interposer in there. Possibly more than one, if they decide to do some local HBM per compute chiplet. And interposers are something that AMD has failed at pretty excruciatingly in the past, see Vega and Radeon VII. The potential saving grace is that if they keep the chiplets small and the interconnects well-designed, perhaps they can keep the interposers small too, which would keep their yields up.

Leave a comment