GPU Setup
Connect GPU compute providers, configure pod defaults, set budget limits, and optimize for cost.
This guide walks you through connecting GPU compute to your lab. You need GPU access to run experiments that require heavy computation (MCMC chains, model training, large-scale data processing).
Connect RunPod
Get your RunPod API key
- Go to runpod.io and sign in
- Navigate to Settings > API Keys
- Create a new API key with full access
- Copy the key
Add the key to Hubify
- Go to Lab Settings > Compute
- Click Connect RunPod
- Paste your API key
- Click Verify & Save
hubify pod config --provider runpod --api-key "your-runpod-api-key"
Verify the connection
hubify pod config --test
RunPod connection: OK
Available GPUs: H200, H100, A100, A40
Account balance: $245.00
Set Default GPU
Configure which GPU type is used when experiments do not specify one:
# Set default GPU
hubify pod config --default-gpu h100
# Set default timeout
hubify pod config --default-timeout 4h
# View current config
hubify pod config --show
Budget Controls
Set spending limits to avoid surprises:
# Monthly budget cap (pods queue when reached)
hubify pod budget --monthly 500
# Per-experiment cap
hubify pod budget --per-experiment 50
# Alert threshold (notify at 80% of budget)
hubify pod budget --alert-threshold 0.8
When the monthly budget is reached:
- New experiments queue instead of launching
- You receive a notification
- The orchestrator suggests cost-saving alternatives
- Running experiments continue until completion
Pod Templates
Create reusable pod configurations for common experiment types:
# Create a template
hubify pod template create \
--name "mcmc-standard" \
--gpu h100 \
--timeout 4h \
--docker-image "hubify/cosmo:latest" \
--env "OMP_NUM_THREADS=16" \
--storage 50GB
# Use a template
hubify experiment run --name "my-chain" --pod-template mcmc-standard
# List templates
hubify pod template list
GPU Selection Guide
| Experiment Type | Recommended GPU | Why |
|---|---|---|
| MCMC chains (< 100K samples) | H100 | Good balance of cost and speed |
| MCMC chains (> 100K samples) | H200 | Large memory prevents OOM |
| Neural network training | H100 or H200 | Depends on model size |
| Anomaly detection (large catalog) | H200 | 141 GB VRAM for full dataset |
| Data preprocessing | CPU | No GPU needed, save money |
| Figure generation | CPU or A40 | Lightweight, save money |
Persistent Storage
Configure persistent storage for datasets and results:
# View storage usage
hubify pod storage list
# Upload a dataset (available to all pods)
hubify pod storage upload ./planck_likelihood.tar.gz
# Set retention policy
hubify pod storage config --retain-days 90
Persistent storage survives pod teardowns. Pre-stage large datasets here so experiments start instantly instead of waiting for downloads.
SSH Keys
Add SSH keys for direct pod access:
# Add your SSH key
hubify pod ssh-key add --file ~/.ssh/id_ed25519.pub
# List configured keys
hubify pod ssh-key list
Monitoring
Monitor active pods from Captain View or CLI:
# Real-time pod status
hubify pod list --watch
# GPU utilization
hubify pod metrics pod-abc123
# Cost tracking
hubify pod cost --month current --breakdown
MONTH TOTAL H200 H100 A100 CPU
2026-04 $312 $180 $112 $20 $0
Coming Soon: Modal
Modal integration will add serverless GPU functions. Instead of managing pods, you deploy functions that run on-demand and charge per second. Ideal for:
- Short-lived tasks (< 10 minutes)
- Bursty workloads
- Figure generation
- Small inferences
Modal support is currently in development.