SDK Demo — Notebook

OSDC Python SDK

Reserve GPUs, run commands, manage persistent disks — all from Python.

[1]:

from gpu_dev import GpuDev

client = GpuDev()

# Reserve 4 H100 GPUs with persistent disk
sandbox = client.reserve(
gpu_type="h100", gpu_count=4,
hours=8, disk_name="my-project"
)

[18.2s] ✓ Ready — 4× H100 · ssh dev@a1b2c3d4.osdc.dev

[2]:

# Upload code and run training
sandbox.upload("./src/", "/home/dev/src/")
result = sandbox.exec("python /home/dev/src/train.py")
print(result.stdout)

Epoch 1/10: loss=2.341 lr=0.001
Epoch 2/10: loss=1.892 lr=0.001
Epoch 3/10: loss=1.204 lr=0.0005
...

Disk Cloning: Parallel Experiments

Clone your environment while it's running. Each clone is an independent copy — perfect for parameter sweeps.

[3]:

# Clone the disk while base is still running
client.clone_disk("my-project", "exp-high-lr")
client.clone_disk("my-project", "exp-low-lr")
sandbox.cancel() # done with base

✓ Cloned my-project → exp-high-lr (3.2s)
✓ Cloned my-project → exp-low-lr (3.1s)

[4]:

from concurrent.futures import ThreadPoolExecutor

def run_experiment(disk, lr):
    with client.reserve(
        gpu_type="h100", disk_name=disk
    ) as sb:
        return sb.exec(f"LR={lr} python train.py")

with ThreadPoolExecutor(2) as pool:
    f1 = pool.submit(run_experiment, "exp-high-lr", 0.01)
    f2 = pool.submit(run_experiment, "exp-low-lr", 0.0001)

[exp-high-lr] Epoch 10/10: loss=0.042 ✓
[exp-low-lr] Epoch 10/10: loss=0.187 ✓
Both experiments completed — reservations auto-cancelled

End-to-End: Fine-tune a Model

Spot instances for cost savings. Persistent disk keeps your checkpoints safe even if spot reclaims.

[5]:

with client.reserve(
    gpu_type="h100", gpu_count=8,
    hours=4, spot=True,
    disk_name="llama-finetune"
) as sb:

    sb.exec("pip install -q transformers peft bitsandbytes")
    sb.upload("./finetune.py", "/home/dev/finetune.py")

    result = sb.exec("torchrun --nproc_per_node 8 finetune.py")
    print(result.stdout)

    sb.download("/home/dev/output/", "./finetuned-model/")

[spot] 8× H100 · 640 GB VRAM · ~70% cheaper
Loading meta-llama/Llama-3.1-8B...
LoRA rank=16, target: q_proj, v_proj
Epoch 1/3: loss=1.842 · 12.4k tokens/s
Epoch 2/3: loss=0.921 · 12.6k tokens/s
Epoch 3/3: loss=0.447 · 12.5k tokens/s
✓ Model saved to /home/dev/output/
Downloaded finetuned-model/ (2.1 GB) · reservation auto-cancelled