K8s Under the Hood
GPU scheduling, persistent storage, and observability on EKS
GPU Scheduling
🎮
NVIDIA GPU Operator
# Pod spec — fractional GPU request
resources
:
limits
:
nvidia.com/gpu
:
4
requests
:
cpu
:
"48"
memory
:
"384Gi"
📐
Fractional Allocation
Min per pod
1 GPU (MIG: 0.125)
Max per pod
8 GPUs (full node)
Multi-node max
64 GPUs (8 nodes)
CPU/GPU ratio
12 cores / GPU
RAM/GPU ratio
96 GiB / GPU
⚡
Orchestration
migrating to K8s-native
Current
Lambda + SQS
Target
K8s Operator (CRD)
Cold start
~15s end-to-end
Storage Architecture
💾
EBS — User Home Volume
# Per-user persistent disk at /home/dev
reserve
→ find snapshot by name
→ create EBS from snapshot
→ PV + PVC → mount
cancel
→ snapshot volume
→ delete vol + PV/PVC
clone
→ snapshot → new snapshot
→ independent copy
📁
EFS — Personal Shared
Mount
/shared/<user>
Persists
across reservations
Capacity
20 TB shared
Cross-AZ
all pods, all nodes
🚀
EFS — Shared Caches
Mount
/cache
PyTorch
pre-cached wheels
GCC / build
shared ccache
Benefit
instant pip install
Observability
📊
GPU Monitoring Stack
# DCGM Exporter → Prometheus → Grafana
DCGM_FI_DEV_GPU_UTIL
DCGM_FI_DEV_FB_USED
DCGM_FI_DEV_GPU_TEMP
DCGM_FI_DEV_POWER_USAGE
DCGM_FI_DEV_PCIE_TX_THROUGHPUT
# Per-pod, per-GPU, real-time
# Grafana dashboards on NodePort
🔍
NVIDIA Profiling
Nsight Compute
ncu
Nsight Systems
nsys
Pod capability
CAP_SYS_ADMIN
Dedicated nodes
H100 + B200
🔎
Reservation Logs
CLI
gpu-dev show --trace
SDK
sandbox.timing()
Pod logs
sandbox.pod_logs()