跳至主要内容

NVIDIA A100 4x GPU HGX Redstone Platform


While the 8x NVIDIA A100 GPU “Delta” platform with NVSwitch got a lot of airtime during the Ampere launch, it was not the only platform being launched today by NVIDIA. The 4x GPU “Redstone” platform is a smaller NVLink mesh platform that is designed to be a lower-cost option.

The NVIDIA A100 “Redstone” HGX platform is important since it is a smaller and less complex version of the HGX A100 platform. The Redstone platform incorporates 4x SXM NVIDIA A100 GPUs onto a PCB. As we saw with the Tesla A100 overview, the new GPUs have 12x NVlinks per GPU. Each NVLink provides 50GB/s of GPU-to-GPU bandwidth for 600GB/s total.

Redstone takes those 12 NVLinks and splits them into three groups. Instead of the NVIDIA NVSwitch solution we see on the HGX A100 platform, we get a mesh topology without switching. NVIDIA has offered both switched and non-switched systems for some time.



This type of topology, NVIDIA has been using for years and is the basis for many important compute nodes. For example, Summit uses NVLink directly attached between Tesla V100 GPUs. With four GPUs per node, each GPU can talk directly to every other GPU.

The importance of Redstone is that the smaller HGX A100 4 GPU board uses much less power due to having fewer GPUs and omitting NVSwitch. Leaving NVSwitch out also means one saves on per-node systems costs. If you simply wanted to 7 MIGs per Tesla A100 up to 4 Tesla A100’s per instance, then this topology can make a lot more sense. Supermicro, along with other vendors are adding A100 4 GPU systems to their portfolios.

As with the larger HGX A100 option, these GPUs have PCIe Gen4 connectivity to their hosts. One can use the HGX A100 4 GPU Redstone board with Intel Xeon Scalable, however, to get full PCIe Gen4 performance one needs to use the AMD EPYC 7002 family or potentially an emerging Arm or POWER option.

You can see the Supermicro Redstone platform based on the AMD EPYC 7002 Rome series here:

This is a great example of how server OEMs can take the HGX A100 4 GPU platform and innovate to provide their own feature sets around the new Ampere generation.

Final Words
For many organizations, the 4x GPU mesh architecture has made a lot of sense. The new HGX A100 4 GPU Redstone platform makes integration of these solutions much easier but also moves some of the design differentiation away from NVIDIA’s partners. Still, this seems to make sense from an industry perspective. Other companies, such as Dell have focused on these 4x Tesla GPU compute nodes for its customers instead of pushing larger solutions. For customers who want the smaller, less costly, and less complex form factor, Redstone makes a lot of sense.

评论

此博客中的热门博文

Nvidia GTX 1660S (Super)

The GTX 1660 Super has a launch price of just $230 USD with comparable performance to the $280 USD 1660 Ti. The 1660 Super has 14 Gbps GDDR6 (versus 12Gbps GDDR6 for the 1660 Ti and 8Gbps GDDR5 for 1660). The 1660 range of cards sits in the sweet spot for many gamers because they offer superb 1080p EFps in popular titles and they are relatively hassle-free in terms of noise, compatibility and stability. The 1660S also features Turing NVENC which is far more efficient than CPU encoding and alleviates the need for casual streamers to use a dedicated stream PC. Shop prices will determine which 1660 series card represents the best value over time but at today's prices, the 1660 Super effectively undercuts the 1660 Ti by $50 USD thus challenging the RX 590 in terms of overall value at 1080p. The next step up from the 1660S would be to the $325 RTX 2060. [Oct '19 GPUPro] Poor: 65%Average: 69.9%Great: 74% Popular builds with this GPU MSI B450 TOMAHAWK MAX (MS-7C02) (2,52...

Chinese GPU chips are catching up with GTX 1080

In addition to CPUs, domestic companies are also beginning to catch up with international standards in the GPU field. A few days ago, Changsha Jingjiawei revealed that the new generation of GPU chips developed by the company is currently in the back-end design, and subsequent progress will be disclosed in regular reports. Although Jing Jiawei did not disclose what the so-called next-generation GPU is, according to the company's previous information, the next-generation GPU chip should be the JM9 series, which has been developed since 2018. On the GPU, Jingjiawei currently has two series, JM5 and JM7. Among them, the JM5400 series has been used in Chinese military aircraft to replace /ATI products, and the JM7200 series uses a 28nm process and has similar performance to NVIDIA’s GT640 graphics card. However, the overall power consumption is less than 10W, which is much lower than the latter's 50WTDP, and some orders have been obtained. The next-generation GPU is the JM9 series. ...