K8s Under the Hood

GPU scheduling, persistent storage, and observability on EKS

GPU Scheduling

🎮 NVIDIA GPU Operator

# Pod spec — fractional GPU request
resources:
  limits:
    nvidia.com/gpu: 4
  requests:
    cpu: "48"
    memory: "384Gi"

📐 Fractional Allocation

Min per pod 1 GPU (MIG: 0.125)
Max per pod 8 GPUs (full node)
Multi-node max 64 GPUs (8 nodes)
CPU/GPU ratio 12 cores / GPU
RAM/GPU ratio 96 GiB / GPU

Orchestration migrating to K8s-native

Current Lambda + SQS
Target K8s Operator (CRD)
Cold start ~15s end-to-end
Storage Architecture

💾 EBS — User Home Volume

# Per-user persistent disk at /home/dev

reserve → find snapshot by name
       → create EBS from snapshot
       → PV + PVC → mount

cancel  → snapshot volume
       → delete vol + PV/PVC

clone   → snapshot → new snapshot
       → independent copy

📁 EFS — Personal Shared

Mount /shared/<user>
Persists across reservations
Capacity 20 TB shared
Cross-AZ all pods, all nodes

🚀 EFS — Shared Caches

Mount /cache
PyTorch pre-cached wheels
GCC / build shared ccache
Benefit instant pip install
Observability

📊 GPU Monitoring Stack

# DCGM Exporter → Prometheus → Grafana

DCGM_FI_DEV_GPU_UTIL
DCGM_FI_DEV_FB_USED
DCGM_FI_DEV_GPU_TEMP
DCGM_FI_DEV_POWER_USAGE
DCGM_FI_DEV_PCIE_TX_THROUGHPUT

# Per-pod, per-GPU, real-time
# Grafana dashboards on NodePort

🔍 NVIDIA Profiling

Nsight Compute ncu
Nsight Systems nsys
Pod capability CAP_SYS_ADMIN
Dedicated nodes H100 + B200

🔎 Reservation Logs

CLI gpu-dev show --trace
SDK sandbox.timing()
Pod logs sandbox.pod_logs()