Experiment Runner
Execute research experiments on GPU pods with automatic logging, checkpointing, QC gates, and reproducibility tracking.
The Experiment Runner is the execution engine of Hubify Labs. It takes experiment definitions, provisions compute, executes code, and captures every detail for reproducibility.
Running an Experiment
1. Open the Captain View
2. Click **New Experiment** (or press `Cmd+E`)
3. Describe the experiment in natural language or fill in the structured form
4. Select compute requirements (GPU type, estimated duration)
5. Click **Run**
The orchestrator will handle agent assignment and pod allocation.
```bash
# Natural language
hubify experiment run "MCMC chain with Planck 2018 + BAO, 200K samples"
# Structured
hubify experiment run \
--name "planck-bao-mcmc" \
--script run_cobaya.py \
--config planck_bao.yaml \
--pod h100 \
--timeout 4h
```
```bash
curl -X POST https://api.hubify.com/v1/labs/bigbounce/experiments \
-H "Authorization: Bearer $HUBIFY_API_KEY" \
-d '{
"name": "planck-bao-mcmc",
"script": "run_cobaya.py",
"config": "planck_bao.yaml",
"pod_type": "h100",
"timeout": "4h"
}'
```
Experiment Dashboard
Each running experiment has a detail view showing:
- Live Logs — Streaming stdout/stderr from the pod
- Metrics — Custom metrics emitted by your script (loss, convergence, sample count)
- Figures — Plots generated during execution, updated in real time
- Resource Usage — GPU utilization, memory, disk I/O
- Checkpoints — Saved intermediate states you can resume from
- Cost — Running cost in USD
Checkpointing
Experiments automatically checkpoint at configurable intervals:
# In your experiment config
checkpoint:
interval: 30m # Save state every 30 minutes
keep_last: 5 # Keep the 5 most recent checkpoints
path: /workspace/checkpoints/
If a pod crashes or an experiment is interrupted, you can resume from the last checkpoint:
hubify experiment resume EXP-054 --from-checkpoint latest
QC Gates
Every experiment passes through a quality control gate before results are accepted:
| Check | Description | Threshold |
|---|---|---|
| Completeness | All expected output files exist | 100% |
| Convergence | R-hat statistic for MCMC chains | < 1.05 |
| Error Bounds | Statistical uncertainties are reasonable | Domain-specific |
| Reproducibility | Config + data + code are frozen | All locked |
| Review | Cross-model verification of results | Pass |
If a QC gate fails, the experiment is flagged and the orchestrator decides whether to:
- Rerun with more samples
- Adjust parameters and retry
- Escalate to you for a decision
Chaining
Experiments can be chained so outputs flow into inputs:
hubify experiment run --chain chain.yaml
# chain.yaml
steps:
- name: preprocess
script: preprocess.py
pod: cpu
- name: mcmc
script: run_mcmc.py
pod: h200
depends_on: preprocess
- name: analysis
script: analyze.py
pod: cpu
depends_on: mcmc
Batch Experiments
Run parameter sweeps or multi-configuration experiments:
hubify experiment batch \
--script train.py \
--sweep '{"learning_rate": [0.001, 0.01, 0.1], "batch_size": [32, 64]}' \
--pod h100
This creates 6 experiments (3 x 2) and runs them in parallel if pods are available.
Reproducibility Record
Every experiment captures:
- Git SHA of the codebase
- Full dependency list (
pip freeze) - Config files (YAML/JSON, checksummed)
- Input data SHA-256 hashes
- Random seeds
- Pod hardware specs
- Start/end timestamps
This record is immutable and attached to the experiment forever.