Compute

Hubify Labs gives you on-demand access to high-end GPU compute for running experiments. Powered by RunPod for on-demand GPU pods.

Supported Hardware

GPU	VRAM	Best For	Cost Range
H200	141 GB	Large-scale MCMC, foundation model inference, multi-survey sweeps	$$$
H100	80 GB	Training runs, medium MCMC chains, anomaly detection	$$
A100	80 GB	General GPU compute, smaller models	$
CPU	N/A	Data preprocessing, analysis, lightweight tasks	Free tier

Pod Lifecycle

Provision

When an experiment needs GPU, Hubify provisions a pod on RunPod. The system selects the optimal GPU type based on the experiment’s memory and compute requirements.

Initialize

The pod boots with your lab’s environment: dependencies installed, data mounted, SSH keys configured.

Execute

Your experiment runs on the pod. Logs stream in real time. Intermediate results checkpoint to persistent storage.

Teardown

When the experiment completes (or fails), the pod is torn down automatically. Results are saved to your lab before teardown.

Cost Optimization

Hubify automatically optimizes for cost:

total_cost = runtime_hours * cost_per_hour

If H200 finishes in 1 hour at $4/hr = $4
   H100 finishes in 3 hours at $2/hr = $6
   → System picks H200 (cheaper overall)

You can set a monthly budget cap per lab. When you approach the limit, experiments queue instead of launching, and you get a notification.

GPU Inference Playbook

Always use torch.utils.data.DataLoader with num_workers=16, pin_memory=True, prefetch_factor=4 for image/data inference. This gives a 32x speedup over serial processing.

Key rules from the playbook:

Never use serial PIL decoding for batch image processing
Never use ProcessPoolExecutor for GPU-bound work
Never use HuggingFace streaming for production inference
Always pin memory and prefetch for GPU DataLoaders

Persistent Storage

Each lab gets persistent storage that survives pod teardowns:

/workspace/ on pods maps to your lab’s persistent volume
Experiment outputs are automatically synced back to the lab
Datasets can be pre-staged in persistent storage for fast access

SSH Access

Every running pod is accessible via SSH for debugging:

# Get SSH command for a running pod
hubify pod ssh EXP-054

# Direct SSH (shown in pod details)
ssh root@205.196.19.52 -p 11452

Idle Pod Detection

An idle GPU is a violation. Hubify monitors pod utilization and alerts you when a pod is sitting idle. The system will suggest the next experiment to deploy on an idle pod.

If a pod finishes its assigned experiment and no follow-up is queued, the system:

Alerts you that the pod is idle
Suggests experiments from the queue that could use this pod
Auto-deploys the next experiment if you have auto-schedule enabled

CLI

# List active pods
hubify pod list

# Launch a pod manually
hubify pod create --gpu h100 --hours 4

# Check pod status
hubify pod status pod-abc123

# SSH into a pod
hubify pod ssh pod-abc123

# Terminate a pod
hubify pod stop pod-abc123

# View cost summary
hubify pod cost --month current

AI Experiment Runner

When no GPU pod is running, Hubify can execute experiments via the AI runner mode — Claude generates a plausible scientific result from the experiment hypothesis and metric, completing the experiment without GPU costs. Toggle via EXPERIMENT_RUNNER_MODE in the orchestrator environment. Useful for development and dry runs.

Documentation Index

​Compute

​Supported Hardware

​Pod Lifecycle

​Cost Optimization

​GPU Inference Playbook

​Persistent Storage

​SSH Access

​Idle Pod Detection

​CLI

​AI Experiment Runner

Compute

Supported Hardware

Pod Lifecycle

Cost Optimization

GPU Inference Playbook

Persistent Storage

SSH Access

Idle Pod Detection

CLI

AI Experiment Runner