GPU Setup

This guide walks you through connecting GPU compute to your lab. You need GPU access to run experiments that require heavy computation (MCMC chains, model training, large-scale data processing).

Connect RunPod

Get your RunPod API key

Go to runpod.io and sign in
Navigate to Settings > API Keys
Create a new API key with full access
Copy the key

Add the key to Hubify

Web UI
CLI

Go to Lab Settings > Compute
Click Connect RunPod
Paste your API key
Click Verify & Save

hubify pod config --provider runpod --api-key "your-runpod-api-key"

Verify the connection

hubify pod config --test

RunPod connection: OK
Available GPUs: H200, H100, A100, A40
Account balance: $245.00

Set Default GPU

Configure which GPU type is used when experiments do not specify one:

# Set default GPU
hubify pod config --default-gpu h100

# Set default timeout
hubify pod config --default-timeout 4h

# View current config
hubify pod config --show

Budget Controls

Set spending limits to avoid surprises:

# Monthly budget cap (pods queue when reached)
hubify pod budget --monthly 500

# Per-experiment cap
hubify pod budget --per-experiment 50

# Alert threshold (notify at 80% of budget)
hubify pod budget --alert-threshold 0.8

When the monthly budget is reached:

New experiments queue instead of launching
You receive a notification
The orchestrator suggests cost-saving alternatives
Running experiments continue until completion

Pod Templates

Create reusable pod configurations for common experiment types:

# Create a template
hubify pod template create \
  --name "mcmc-standard" \
  --gpu h100 \
  --timeout 4h \
  --docker-image "hubify/cosmo:latest" \
  --env "OMP_NUM_THREADS=16" \
  --storage 50GB

# Use a template
hubify experiment run --name "my-chain" --pod-template mcmc-standard

# List templates
hubify pod template list

GPU Selection Guide

Experiment Type	Recommended GPU	Why
MCMC chains (< 100K samples)	H100	Good balance of cost and speed
MCMC chains (> 100K samples)	H200	Large memory prevents OOM
Neural network training	H100 or H200	Depends on model size
Anomaly detection (large catalog)	H200	141 GB VRAM for full dataset
Data preprocessing	CPU	No GPU needed, save money
Figure generation	CPU or A40	Lightweight, save money

Persistent Storage

Configure persistent storage for datasets and results:

# View storage usage
hubify pod storage list

# Upload a dataset (available to all pods)
hubify pod storage upload ./planck_likelihood.tar.gz

# Set retention policy
hubify pod storage config --retain-days 90

Persistent storage survives pod teardowns. Pre-stage large datasets here so experiments start instantly instead of waiting for downloads.

SSH Keys

Add SSH keys for direct pod access:

# Add your SSH key
hubify pod ssh-key add --file ~/.ssh/id_ed25519.pub

# List configured keys
hubify pod ssh-key list

Monitoring

Monitor active pods from Captain View or CLI:

# Real-time pod status
hubify pod list --watch

# GPU utilization
hubify pod metrics pod-abc123

# Cost tracking
hubify pod cost --month current --breakdown

MONTH      TOTAL    H200    H100    A100    CPU
2026-04    $312     $180    $112    $20     $0

Modal integration will add serverless GPU functions. Instead of managing pods, you deploy functions that run on-demand and charge per second. Ideal for:

Short-lived tasks (< 10 minutes)
Bursty workloads
Figure generation
Small inferences

Modal support is currently in development.

Documentation Index

​GPU Setup

​Connect RunPod

​Set Default GPU

​Budget Controls

​Pod Templates

​GPU Selection Guide

​Persistent Storage

​SSH Keys

​Monitoring

​Coming Soon: Modal