GPU Clustering | Multi-node Multi-gpu

2 min read2 days ago

In this world of AI, Models are getting more weighted than ever. This world of exapansion becomes more rapid when we move towards Gen AI. For the LLMs, we can’t simply inference it into single GPU even if we bought $25,000 H100 which has 80 GB VRAM, we would still require 3 of them Clustered for Llama 3.1 405B in 4-bit Low Precision Mode.

Here comes the savior. GPU Clustering. What I mean with Clustering is the use the combined computes of multiple GPUs those are in either inter-node or intra-node communication.

Requirements:

Let’s Break-down from scratch. When we think of 2 GPU combined to work together, then there must be some medium so that they share the work progress to each other, eh? ie. our very first requirement, high bandwidth inter-node communication.

What about the applications, software we write to have the cluster? We use nccl from NVIDIA as a communication backend.

Hardware, Obviously.

For our Demo, we have 2 Nodes with each Nvidia GPU and a head node.

Cluster Implementation

There’re various options available out there for cluster implementation, which includes Data Parallelism, Model Parallelism, Tensor Parallelism, DDP(Distributed Data Parallelism), FSDP (Fully Shraded Data Parallelism). State of the art here is FSDP, which has the utmost performance with all GPU usage at any time instant.

GPU Clustering | Multi-node Multi-gpu

Written by James Bhattarai

No responses yet