GPU Compute

Hubify Labs integrates directly with GPU cloud providers to give you on-demand access to high-end compute. Currently powered by RunPod, with Modal serverless functions coming soon.

Pod Management

Provision

Specify GPU type and duration. The system finds the cheapest available pod matching your requirements.

Initialize

Your lab’s environment is set up automatically: Python packages, data mounts, SSH keys, and monitoring agents.

Execute

Run experiments. Logs stream in real time. Intermediate results checkpoint to persistent storage.

Monitor

Track GPU utilization, memory, and cost in real time from Captain View or CLI.

Teardown

Pods shut down automatically when experiments complete. Results are synced before teardown.

GPU Options

GPU	VRAM	Use Case	Approx. Cost
NVIDIA H200	141 GB	Large MCMC, multi-survey sweeps, foundation models	$4-6/hr
NVIDIA H100	80 GB	Training, medium MCMC, anomaly detection	$2-4/hr
NVIDIA A100	80 GB	General GPU compute, inference	$1-2/hr
NVIDIA A40	48 GB	Light GPU tasks, development	$0.50-1/hr

Cost Controls

Set a monthly budget cap per lab:

# Set budget cap
hubify pod budget --monthly 500

# View current spend
hubify pod cost --month current

# Get cost forecast
hubify pod cost --forecast

When you approach the budget limit:

New experiments queue instead of launching
You receive a notification
The orchestrator suggests cost-saving alternatives (smaller GPU, CPU-only preprocessing)

Auto-Optimization

The system picks the cheapest option for each experiment:

Experiment needs ~2 hours on H100 ($2/hr) = $4
Same experiment runs ~45 min on H200 ($5/hr) = $3.75
→ System picks H200 (cheaper overall despite higher hourly rate)

Override with explicit pod selection when needed.

Persistent Storage

Each lab gets persistent storage:

Survives pod teardowns
Pre-stage large datasets for instant access
Experiment outputs sync automatically
Configurable retention policies

# List persistent storage
hubify pod storage list

# Upload data to persistent storage
hubify pod storage upload ./large_dataset.fits

# Download results
hubify pod storage download /results/chain_samples.txt

SSH Access

Every running pod is accessible via SSH:

# Auto-connect to a pod
hubify pod ssh pod-abc123

# Get connection details
hubify pod info pod-abc123
# → SSH: root@205.196.19.52 -p 11452

Idle Detection

An idle GPU is wasted money. Hubify monitors utilization and takes action when pods sit idle.

When a pod finishes its experiment and nothing is queued:

Alert sent to you and the orchestrator
System suggests next experiments that could use the pod
If auto-schedule is enabled, the next experiment deploys automatically
If nothing is queued for 15 minutes, the pod tears down

DataLoader Best Practices

For production GPU inference, always use optimized DataLoaders:

from torch.utils.data import DataLoader

loader = DataLoader(
    dataset,
    batch_size=64,
    num_workers=16,       # Parallel data loading
    pin_memory=True,      # Fast GPU transfer
    prefetch_factor=4,    # Prefetch batches
    persistent_workers=True
)

This pattern provides a 32x speedup over serial processing.

CLI Reference

hubify pod list              # List all pods
hubify pod create --gpu h100 # Launch a pod
hubify pod status <id>       # Check pod status
hubify pod ssh <id>          # SSH into a pod
hubify pod stop <id>         # Terminate a pod
hubify pod cost              # View cost summary
hubify pod budget            # Manage budget

Documentation Index

​GPU Compute

​Pod Management

​GPU Options

​Cost Controls

​Auto-Optimization

​Persistent Storage

​SSH Access

​Idle Detection

​DataLoader Best Practices

​CLI Reference

GPU Compute

Pod Management

GPU Options

Cost Controls

Auto-Optimization

Persistent Storage

SSH Access

Idle Detection

DataLoader Best Practices

CLI Reference