NVIDIA GB200 NVL72
In a rack-scale configuration, GB200 NVL72 links 36 Grace CPUs with 72 Blackwell GPUs. With a 72-GPU NVLink domain that functions as a single huge GPU and provides 30X quicker real-time trillion-parameter LLM inference, the GB200 NVL72 is a liquid-cooled, rack-scale solution.
A crucial part of the NVIDIA GB200 NVL72 is the GB200 Grace Blackwell Superchip, which uses the NVIDIA NVLink-C2C interconnect to link two powerful NVIDIA Blackwell Tensor Core GPUs and an NVIDIA Grace CPU.
Instantaneous LLM Deduction
Together with fifth-generation NVIDIA NVLink, the GB200 NVL72 offers state-of-the-art features including a second-generation Transformer Engine that powers FP4 AI and provides 30X faster real-time LLM inference performance for trillion-parameter language models. A new generation of Tensor Cores, which bring new microscaling formats and offer high accuracy and higher throughput, makes this progress possible. Furthermore, the GB200 NVL72 overcomes communication obstacles by combining liquid cooling and NVLink to build a single, enormous 72-GPU rack.
Large-Scale Instruction
A faster second-generation Transformer Engine with FP8 precision is included in the GB200 NVL72, allowing for an amazing 4X faster training time for big language models at scale. The fifth-generation NVLink, which offers NVIDIA Magnum IOTM software, InfiniBand networking, and 1.8 terabytes per second (TB/s) of GPU-to-GPU connectivity, complements this innovation.
Infrastructure with Low Energy Use
Data centres with liquid-cooled GB200 NVL72 racks use less energy and have a smaller carbon footprint. Large NVLink domain architectures benefit from liquid cooling’s ability to boost computation density, minimise floor space consumption, and enable high-bandwidth, low-latency GPU connection. When compared to NVIDIA H100 air-cooled infrastructure, the GB200 uses less water and offers 25X higher performance at the same power.
Data Entry
For businesses, databases are essential for managing, processing, and evaluating massive amounts of data. GB200 leverages the NVIDIA Blackwell architecture’s high-bandwidth memory performance, NVLink-C2C, and dedicated decompression engines to expedite critical database queries by 18X as compared to CPU and provide a 5X better total cost of ownership.
Although there are many advantages, there can be significant computational and resource costs associated with training and deploying big models. Widespread implementation will depend on computationally, financially, and energy-efficient systems that are designed to provide real-time inference. One such system that is capable of the job is the new NVIDIA GB200 NVL72.
Let’s look at the Mixture of Experts (MoE) models as an example. By utilising pipeline and model parallelism, these models facilitate the training of thousands of GPUs while distributing the computational load across numerous specialists. increasing the effectiveness of the system.
But GPU clusters may be able to make the technical issue manageable thanks to a new level of parallel processing, high-speed memory, and high-performance connections. This is accomplished by the NVIDIA GB200 NVL72 rack-scale architecture, which NVIDIA describes in more detail in the post that follows.
GB200 NVIDIA NVL36 and NVL72
In NVLink domains, the GB200 supports 36 and 72 GPUs. Based on the NVLink Switch System and the MGX reference design, each rack houses 18 computing nodes. With 18 solitary GB200 compute nodes and 36 GPUs in a single rack, it is available in the GB200 NVL36 configuration. With 72 GPUs in one rack and 18 dual GB200 compute nodes, or 72 GPUs in two racks with 18 single GB200 compute nodes, is how the GB200 NVL72 is arranged.
For ease of use, the GB200 NVL72 tightly packs and links the GPUs using a copper cable cartridge. Additionally, it features a liquid cooling system design, which results in 25 times less energy and cost usage.
NVIDIA GB200 NVL72 Features
Architecture by Blackwell
With unmatched speed, efficiency, and scalability, the NVIDIA Blackwell architecture ushers in a new era of computing with revolutionary advances in accelerated computing.
NVIDIA Grace Processor
An innovative processor for AI, cloud, and HPC applications running in contemporary data centres is the NVIDIA Grace CPU. It offers exceptional speed and memory bandwidth at a 2X energy efficiency compared to the top server processors available today.
Fifth-Stage NVIDIA NVLink Technology
For exascale computing and trillion-parameter AI models to reach their full potential, quick, smooth communication between each GPU in a server cluster is necessary. A scale-up link, the fifth iteration of NVLink unlocks faster performance for trillion- and multi-trillion-parameter AI models.
Graphics Processing Unit
As the foundation for distributed AI model training and generative AI performance, the data center’s network is essential to the development and performance of AI. For the best possible application performance, NVIDIA Quantum-X800 InfiniBand, NVIDIA Spectrum-X800 Ethernet, and NVIDIA BlueField-3 DPUs provide effective scalability over hundreds or thousands of Blackwell GPUs.
Nvidia GB200 NVL72 price
The NVIDIA DGX GB200 NVL72 is priced accordingly, being a high-end device aimed at academic institutions and major enterprises. One estimate places the price of a fully loaded system with 72 GB200 Superchips at approximately $3 million USD.
Here’s the reason it costs so much:
Abundant processing power: Equipped with 72 GPUs, it achieves remarkable performance in the 1.44 exaFLOPs of FP4 exaflop range.
Advanced hardware: It includes a liquid cooling system, a unique NVLink switch system for high-speed networking, and 13.5 TB of HBM3e memory.
It is challenging to find a publicly published price for the DGX GB200 NVL72 due to its specialised market. The $3 million estimate, nevertheless, is a reasonable approximation.
NVIDIA GB200 NVL72 Specs
GB200 NVL72 | GB200 Grace Blackwell Superchip | |
Configuration | 36 Grace CPU : 72 Blackwell GPUs | 1 Grace CPU : 2 Blackwell GPU |
FP4 Tensor Core | 1,440 PFLOPS | 40 PFLOPS |
FP8/FP6 Tensor Core | 720 PFLOPS | 20 PFLOPS |
INT8 Tensor Core | 720 POPS | 20 POPS |
FP16/BF16 Tensor Core | 360 PFLOPS | 10 PFLOPS |
TF32 Tensor Core | 180 PFLOPS | 5 PFLOPS |
FP32 | 6,480 TFLOPS | 180 TFLOPS |
FP64 | 3,240 TFLOPS | 90 TFLOPS |
FP64 Tensor Core | 3,240 TFLOPS | 90 TFLOPS |
GPU Memory | Bandwidth | Up to 13.5 TB HBM3e | 576 TB/s | Up to 384 GB HBM3e | 16 TB/s |
NVLink Bandwidth | 130TB/s | 3.6TB/s |
CPU Core Count | 2,592 Arm Neoverse V2 cores | 72 Arm Neoverse V2 cores |
CPU Memory | Bandwidth | Up to 17 TB LPDDR5X | Up to 18.4 TB/s | Up to 480GB LPDDR5X | Up to 512 GB/s |
0 Comments