K8s Under the Hood

GPU Scheduling

🎮 NVIDIA GPU Operator

            # Pod spec — fractional GPU request

            resources:

              limits:

                nvidia.com/gpu: 4

              requests:

                cpu:
            "48"

                memory:
            "384Gi"

📐 Fractional Allocation

Min per pod 1 GPU (MIG: 0.125)

Max per pod 8 GPUs (full node)

Multi-node max 64 GPUs (8 nodes)

CPU/GPU ratio 12 cores / GPU

RAM/GPU ratio 96 GiB / GPU

⚡ Orchestration migrating to K8s-native

Current Lambda + SQS

Target K8s Operator (CRD)

Cold start ~15s end-to-end

Storage Architecture

💾 EBS — User Home Volume

            # Per-user persistent disk at /home/dev

            reserve → find snapshot by name

                   → create EBS from snapshot

                   → PV + PVC → mount

            cancel  → snapshot volume

                   → delete vol + PV/PVC

            clone   → snapshot → new snapshot

                   → independent copy

📁 EFS — Personal Shared

Mount /shared/<user>

Persists across reservations

Capacity 20 TB shared

Cross-AZ all pods, all nodes

🚀 EFS — Shared Caches

Mount /cache

PyTorch pre-cached wheels

GCC / build shared ccache

Benefit instant pip install

Observability

📊 GPU Monitoring Stack

            # DCGM Exporter → Prometheus → Grafana

            DCGM_FI_DEV_GPU_UTIL

            DCGM_FI_DEV_FB_USED

            DCGM_FI_DEV_GPU_TEMP

            DCGM_FI_DEV_POWER_USAGE

            DCGM_FI_DEV_PCIE_TX_THROUGHPUT

            # Per-pod, per-GPU, real-time

            # Grafana dashboards on NodePort

🔍 NVIDIA Profiling

Nsight Compute ncu

Nsight Systems nsys

Pod capability CAP_SYS_ADMIN

Dedicated nodes H100 + B200

🔎 Reservation Logs

CLI gpu-dev show --trace

SDK sandbox.timing()

Pod logs sandbox.pod_logs()