New NVIDIA L40S GPU-accelerated OCI Instances

 

Expanding NVIDIA GPU-Accelerated Instances for  AI, Digital Twins, and Other Uses is Oracle  Cloud Infrastructure

In order to boost productivity, cut expenses, and spur creativity, businesses are quickly using generative AI, large language models (LLMs), sophisticated visuals, and digital twins.

But in order for businesses to use these technologies effectively, they must have access to cutting edge full-stack accelerated computing systems. Oracle Cloud Infrastructure (OCI) today announced the imminent release of a new virtual machine powered by a single NVIDIA H100 Tensor Core GPU and the availability of NVIDIA L40S GPU bare-metal instances that are available for order to match this demand. With the addition of this new virtual machine, OCI’s H100 offering now includes an NVIDIA HGX H100 8-GPU bare-metal instance.

These platforms offer strong performance and efficiency when combined with NVIDIA networking and the NVIDIA software stack, allowing businesses to enhance generative AI.

You can now order the NVIDIA L40S GPU on OCI

Designed to provide innovative multi-workload acceleration for generative AI, graphics, and video applications, the NVIDIA L40S GPU is universal data centre GPU. With its fourth-generation Tensor Cores and FP8 data format support, the L40S GPU is an excellent choice for inference in a variety of generative AI use cases, as well as for training and optimising small- to mid-size LLMs.

For Llama 3 8B with NVIDIA TensorRT-LLM at an input and output sequence length of 128, for instance, a single L40S GPU (FP8) may produce up to 1.4 times as many tokens per second as a single NVIDIA A100 Tensor Core GPU (FP16).

Additionally, the NVIDIA L40S GPU offers media acceleration and best-in-class graphics. It is perfect for digital twin and complex visualisation applications because of its numerous encode/decode engines and third-generation NVIDIA Ray Tracing Cores (RT Cores).

With support for NVIDIA DLSS 3, the L40S GPU offers up to 3.8 times the real-time ray-tracing capabilities of its predecessor, resulting in quicker rendering and smoother frame rates. Because of this, the GPU is perfect for creating apps on the NVIDIA Omniverse platform, which enables AI-enabled digital twins and real-time, lifelike 3D simulations. Businesses may create sophisticated 3D apps and workflows for industrial digitalization using Omnivores on the L40S GPU. These will enable them to design, simulate, and optimise facilities, processes, and products in real time before they go into production.

NVIDIA L40S 48gb

OCI’s BM.GPU.L40S will include the L40S GPU. Featuring four NVIDIA L40S GPUs, each with 48GB of GDDR6 memory, this computational form is bare metal. This form factor comprises 1TB of system memory, 7.38TB local NVMe SSDs, and 112-core 4th generation Intel Xeon CPUs.

With OCI’s bare-metal compute architecture, these forms do away with the overhead of any virtualisation for high-throughput and latency-sensitive  AI or machine learning workloads. By removing data centre responsibilities off CPUs, the NVIDIA BlueField-3 DPU in the accelerated compute form improves server efficiency and speeds up workloads related to networking, storage, and security. By utilising BlueField-3 DPUs, OCI is advancing its off-box virtualisation approach for its whole fleet.

OCI Supercluster with NVIDIA L40S allows for ultra-high performance for up to 3,840 GPUs with minimal latency and 800Gbps internode bandwidth. NVIDIA ConnectX-7 NICs over RoCE v2 are used by OCI’s cluster network to handle workloads that are latency-sensitive and high throughput, such as AI training.

“For 30% more efficient video encoding, we chose OCI AI infrastructure with bare-metal instances and NVIDIA L40S GPUs,” stated Beamr  Cloud CEO Sharon Carmel.50% or less on the network and storage traffic will be used for videos processed with Beamr Cloud on OCI, resulting in two times faster file transfers and higher end user productivity. Beamr will offer video AI workflows to OCI clients, getting them ready for the future of video.

OCI to Feature Single-GPU H100 VMs Soon

Soon to be available at OCI, the VM.GPU.H100.1 compute virtual machine shape is powered by a single NVIDIA H100 Tensor Core GPU. For businesses wishing to use the power of NVIDIA H100 GPUs for their generative  AI and HPC workloads, this will offer affordable, on-demand access.

A decent platform for LLM inference and lesser workloads is an H100 alone. For instance, with NVIDIA TensorRT-LLM at an input and output sequence length of 128 and FP8 precision, a single H100 GPU can produce more than 27,000 tokens per second for Llama 3 8B (up to 4x greater throughput than a single A100 GPU at FP16 precision).

VM.GPU.H100 is the one. form is well-suited for a variety of AI workloads because it has 13 cores of 4th Gen Intel Xeon processors, 246GB of system memory, and a capacity for 2×3.4TB NVMe drives.

“Oracle Cloud’s bare-metal compute with NVIDIA H100 and A100 GPUs, low-latency Supercluster, and high-performance storage delivers up to” claimed Yeshwant Mummaneni, head engineer of data management analytics at Altair. 20% better price-performance for Altair’s computational fluid dynamics and structural mechanics solvers.” “We are eager to use these GPUs in conjunction with virtual machines to power the Altair Unlimited virtual appliance.”

Validation Samples for GH200 Bare-Metal Instances Are Available

The BM.GPU.GH200 compute form is also available for customer testing from OCI. It has the NVIDIA Grace Hopper Superchip and NVLink-C2C, which connects the NVIDIA Grace CPU and NVIDIA Hopper GPU at 900GB/s with high bandwidth and cache coherence. With more than 600GB of RAM that is available, apps handling terabytes of data can operate up to 10 times faster than they would on an NVIDIA A100 GPU.

Software That’s Optimised for Enterprise AI

Businesses can speed up their  AI, HPC, and data analytics workloads on OCI with a range of NVIDIA GPUs. But an optimised software layer is necessary to fully realise the potential of these GPU-accelerated compute instances.

World-class generative AI applications may be deployed securely and reliably with the help of NVIDIA NIM, a set of user-friendly microservices that are part of the NVIDIA AI Enterprise software platform that is available on the OCI Marketplace. NVIDIA NIM is designed for high-performance AI model inference.

NIM pre-built containers, which are optimised for NVIDIA GPUs, give developers better security, a quicker time to market, and a lower total cost of ownership. NVIDIA API Catalogue offers NIM microservices for common community models, which can be simply deployed on Open Cross Infrastructure (OCI).

With the arrival of future GPU-accelerated instances, such as NVIDIA Blackwell and H200 Tensor Core GPUs, performance will only get better with time.

Contact OCI to test the GH200 Superchip and order the L40S GPU. Join Oracle and NVIDIA SIGGRAPH, the world’s preeminent graphics conference, which is taking place until August 1st, to find out more.

L40S NVIDIA price

Priced at approximately $10,000 USD, the NVIDIA L40S GPU is intended for use in data centres and AI tasks. It is an improved L40 that was created especially for AI applications rather than visualisation jobs. This GPU can be used for a variety of high-performance applications, including media acceleration, large language model (LLM) training, inference, and 3D graphics rendering. It is driven by NVIDIA’s Ada Lovelace architecture.


Post a Comment

0 Comments