2ร8 H100 Example
Node 0 (master)
H100
H100
H100
H100
H100
H100
H100
H100
NVLink within node
EFA
3200 Gbps
Node 1 (worker)
H100
H100
H100
H100
H100
H100
H100
H100
NVLink within node
NCCL AllReduce Bandwidth
One Command โ Automatic NCCL + SSH + Env Setup
$
gpu-dev reserve
--gpu-type
h100
--gpus
16
--distributed
--hours
8
๐ Reservation mn-abc123 โ 2 nodes ร 8 GPUs
๐ EFA networking configured ยท peer SSH ready
โ MASTER_ADDR, WORLD_SIZE, RANK, NCCL_* pre-configured
# Just run torchrun โ everything is set up
dev@node-0 $ torchrun --nproc_per_node 8 --nnodes 2 --node_rank 0 train.py
๐ Reservation mn-abc123 โ 2 nodes ร 8 GPUs
๐ EFA networking configured ยท peer SSH ready
โ MASTER_ADDR, WORLD_SIZE, RANK, NCCL_* pre-configured
# Just run torchrun โ everything is set up
dev@node-0 $ torchrun --nproc_per_node 8 --nnodes 2 --node_rank 0 train.py