跳至主要内容

NVIDIA A100 4x GPU HGX Redstone Platform


While the 8x NVIDIA A100 GPU “Delta” platform with NVSwitch got a lot of airtime during the Ampere launch, it was not the only platform being launched today by NVIDIA. The 4x GPU “Redstone” platform is a smaller NVLink mesh platform that is designed to be a lower-cost option.

The NVIDIA A100 “Redstone” HGX platform is important since it is a smaller and less complex version of the HGX A100 platform. The Redstone platform incorporates 4x SXM NVIDIA A100 GPUs onto a PCB. As we saw with the Tesla A100 overview, the new GPUs have 12x NVlinks per GPU. Each NVLink provides 50GB/s of GPU-to-GPU bandwidth for 600GB/s total.

Redstone takes those 12 NVLinks and splits them into three groups. Instead of the NVIDIA NVSwitch solution we see on the HGX A100 platform, we get a mesh topology without switching. NVIDIA has offered both switched and non-switched systems for some time.



This type of topology, NVIDIA has been using for years and is the basis for many important compute nodes. For example, Summit uses NVLink directly attached between Tesla V100 GPUs. With four GPUs per node, each GPU can talk directly to every other GPU.

The importance of Redstone is that the smaller HGX A100 4 GPU board uses much less power due to having fewer GPUs and omitting NVSwitch. Leaving NVSwitch out also means one saves on per-node systems costs. If you simply wanted to 7 MIGs per Tesla A100 up to 4 Tesla A100’s per instance, then this topology can make a lot more sense. Supermicro, along with other vendors are adding A100 4 GPU systems to their portfolios.

As with the larger HGX A100 option, these GPUs have PCIe Gen4 connectivity to their hosts. One can use the HGX A100 4 GPU Redstone board with Intel Xeon Scalable, however, to get full PCIe Gen4 performance one needs to use the AMD EPYC 7002 family or potentially an emerging Arm or POWER option.

You can see the Supermicro Redstone platform based on the AMD EPYC 7002 Rome series here:

This is a great example of how server OEMs can take the HGX A100 4 GPU platform and innovate to provide their own feature sets around the new Ampere generation.

Final Words
For many organizations, the 4x GPU mesh architecture has made a lot of sense. The new HGX A100 4 GPU Redstone platform makes integration of these solutions much easier but also moves some of the design differentiation away from NVIDIA’s partners. Still, this seems to make sense from an industry perspective. Other companies, such as Dell have focused on these 4x Tesla GPU compute nodes for its customers instead of pushing larger solutions. For customers who want the smaller, less costly, and less complex form factor, Redstone makes a lot of sense.

评论

此博客中的热门博文

AMD Ruilong 5000 APU "Cezanne" exposure: GPU still integrates Vega

Although Ruilong 4000 series desktop APU processors have not been officially released, news about the next generation Ruilong 5000 series has begun to gradually reveal. According to the device ID information captured by Komachi, the AMD Ruilong 5000 APU is code-named "Cezanne (Cezanne)" and the PCI ID is 1638. Previously, the PCI ID of the Ruilong 4000 Renoir APU was 1636. Interestingly, according to Komachi, the partner with Cezanne Zen3 CPU is still GFX9, which is the Vega (Vega) GPU. Of course, AMD will further tap Vega's potential and debug performance. Finally, there is good news. The Ruilong 5000 "Cezanne" APU will retain the AM4 interface. With the realization of Cezanne, we can also sort out Zen3 family products. EPYC server product code is Milan (Milan), desktop HEDT fever platform thread tearer code is Genesis Peak, desktop mainstream platform (Ruilong 4000) code Vermeer (Vermil). In addition, the Zen4 architecture EPYC codename Genoa (Genoa), the...

How to Overclock a GPU for Epic Gaming

Those who play games on computers — the kinds that require a decent video graphics card — may sometimes encounter video lag or choppy frame rates. This means that the card’s GPU is struggling to keep up, typically during data-intensive parts of games. There’s a way to surpass this deficiency and improve your system’s gaming prowess, all without having to purchase an upgrade. Just overclock the GPU. Most video graphics cards use default/stock settings that leave some headroom. That means there is more power and capability available, but it's not enabled by the manufacturer. If you have a Windows or Linux OS system (sorry Mac users, but it’s not as easy or worth it to attempt overclocking), you can increase core and memory clock speeds to boost performance. The result improves frame rates, which leads to smoother, more pleasing gameplay. It’s true that reckless GPU overclocking can permanently stop the graphics card from working (i.e. bricking) or shorten the lifespan of a video gr...

NVIDIA GeForce RTX 2080 Ti

The GeForce RTX 2080 Ti is an enthusiast-class graphics card by NVIDIA, launched in September 2018. Built on the 12 nm process, and based on the TU102 graphics processor, in its TU102-300A-K1-A1 variant, the card supports DirectX 12 Ultimate. This ensures that all modern games will run on GeForce RTX 2080 Ti. Additionally, the DirectX 12 Ultimate capability guarantees support for hardware-raytracing, variable-rate shading and more, in upcoming video games. The TU102 graphics processor is a large chip with a die area of 754 mm² and 18,600 million transistors. Unlike the fully unlocked TITAN RTX, which uses the same GPU but has all 4608 shaders enabled, NVIDIA has disabled some shading units on the GeForce RTX 2080 Ti to reach the product's target shader count. It features 4352 shading units, 272 texture mapping units, and 88 ROPs. Also included are 544 tensor cores which help improve the speed of machine learning applications. The card also has 68 raytracing acceleration cores. NVID...