osdc_quickstart.ipynb
Python 3 ยท gpu-dev v0.6.6

OSDC Python SDK

Reserve GPUs, run commands, manage persistent disks โ€” all from Python.

[1]:
from gpu_dev import GpuDev

client = GpuDev()

# Reserve 4 H100 GPUs with persistent disk
sandbox = client.reserve(
    gpu_type="h100", gpu_count=4,
    hours=8, disk_name="my-project"
)
[18.2s] โœ“ Ready โ€” 4ร— H100 ยท ssh dev@a1b2c3d4.osdc.dev
[2]:
# Upload code and run training
sandbox.upload("./src/", "/home/dev/src/")
result = sandbox.exec("python /home/dev/src/train.py")
print(result.stdout)
Epoch 1/10: loss=2.341 lr=0.001
Epoch 2/10: loss=1.892 lr=0.001
Epoch 3/10: loss=1.204 lr=0.0005
...

Disk Cloning: Parallel Experiments

Clone your environment while it's running. Each clone is an independent copy โ€” perfect for parameter sweeps.

[3]:
# Clone the disk while base is still running
client.clone_disk("my-project", "exp-high-lr")
client.clone_disk("my-project", "exp-low-lr")
sandbox.cancel()  # done with base
โœ“ Cloned my-project โ†’ exp-high-lr (3.2s)
โœ“ Cloned my-project โ†’ exp-low-lr (3.1s)
[4]:
from concurrent.futures import ThreadPoolExecutor

def run_experiment(disk, lr):
    with client.reserve(
        gpu_type="h100", disk_name=disk
    ) as sb:
        return sb.exec(f"LR={lr} python train.py")

with ThreadPoolExecutor(2) as pool:
    f1 = pool.submit(run_experiment, "exp-high-lr", 0.01)
    f2 = pool.submit(run_experiment, "exp-low-lr", 0.0001)
[exp-high-lr] Epoch 10/10: loss=0.042 โœ“
[exp-low-lr]  Epoch 10/10: loss=0.187 โœ“
Both experiments completed โ€” reservations auto-cancelled

End-to-End: Fine-tune a Model

Spot instances for cost savings. Persistent disk keeps your checkpoints safe even if spot reclaims.

[5]:
with client.reserve(
    gpu_type="h100", gpu_count=8,
    hours=4, spot=True,
    disk_name="llama-finetune"
) as sb:

    sb.exec("pip install -q transformers peft bitsandbytes")
    sb.upload("./finetune.py", "/home/dev/finetune.py")

    result = sb.exec("torchrun --nproc_per_node 8 finetune.py")
    print(result.stdout)

    sb.download("/home/dev/output/", "./finetuned-model/")
[spot] 8ร— H100 ยท 640 GB VRAM ยท ~70% cheaper
Loading meta-llama/Llama-3.1-8B...
LoRA rank=16, target: q_proj, v_proj
Epoch 1/3: loss=1.842 ยท 12.4k tokens/s
Epoch 2/3: loss=0.921 ยท 12.6k tokens/s
Epoch 3/3: loss=0.447 ยท 12.5k tokens/s
โœ“ Model saved to /home/dev/output/
Downloaded finetuned-model/ (2.1 GB) ยท reservation auto-cancelled