Experiment Runner

Execute research experiments on GPU pods with automatic logging, checkpointing, QC gates, and reproducibility tracking.

The Experiment Runner is the execution engine of Hubify Labs. It takes experiment definitions, provisions compute, executes code, and captures every detail for reproducibility.

Running an Experiment

1. Open the Captain View
2. Click **New Experiment** (or press `Cmd+E`)
3. Describe the experiment in natural language or fill in the structured form
4. Select compute requirements (GPU type, estimated duration)
5. Click **Run**

The orchestrator will handle agent assignment and pod allocation.


```bash
# Natural language
hubify experiment run "MCMC chain with Planck 2018 + BAO, 200K samples"

# Structured
hubify experiment run \
  --name "planck-bao-mcmc" \
  --script run_cobaya.py \
  --config planck_bao.yaml \
  --pod h100 \
  --timeout 4h
```


```bash
curl -X POST https://api.hubify.com/v1/labs/bigbounce/experiments \
  -H "Authorization: Bearer $HUBIFY_API_KEY" \
  -d '{
    "name": "planck-bao-mcmc",
    "script": "run_cobaya.py",
    "config": "planck_bao.yaml",
    "pod_type": "h100",
    "timeout": "4h"
  }'
```

Experiment Dashboard

Each running experiment has a detail view showing:

  • Live Logs — Streaming stdout/stderr from the pod
  • Metrics — Custom metrics emitted by your script (loss, convergence, sample count)
  • Figures — Plots generated during execution, updated in real time
  • Resource Usage — GPU utilization, memory, disk I/O
  • Checkpoints — Saved intermediate states you can resume from
  • Cost — Running cost in USD

Checkpointing

Experiments automatically checkpoint at configurable intervals:

# In your experiment config
checkpoint:
  interval: 30m    # Save state every 30 minutes
  keep_last: 5     # Keep the 5 most recent checkpoints
  path: /workspace/checkpoints/

If a pod crashes or an experiment is interrupted, you can resume from the last checkpoint:

hubify experiment resume EXP-054 --from-checkpoint latest

QC Gates

Every experiment passes through a quality control gate before results are accepted:

CheckDescriptionThreshold
CompletenessAll expected output files exist100%
ConvergenceR-hat statistic for MCMC chains< 1.05
Error BoundsStatistical uncertainties are reasonableDomain-specific
ReproducibilityConfig + data + code are frozenAll locked
ReviewCross-model verification of resultsPass

If a QC gate fails, the experiment is flagged and the orchestrator decides whether to:

  • Rerun with more samples
  • Adjust parameters and retry
  • Escalate to you for a decision

Chaining

Experiments can be chained so outputs flow into inputs:

hubify experiment run --chain chain.yaml
# chain.yaml
steps:
  - name: preprocess
    script: preprocess.py
    pod: cpu
  - name: mcmc
    script: run_mcmc.py
    pod: h200
    depends_on: preprocess
  - name: analysis
    script: analyze.py
    pod: cpu
    depends_on: mcmc

Batch Experiments

Run parameter sweeps or multi-configuration experiments:

hubify experiment batch \
  --script train.py \
  --sweep '{"learning_rate": [0.001, 0.01, 0.1], "batch_size": [32, 64]}' \
  --pod h100

This creates 6 experiments (3 x 2) and runs them in parallel if pods are available.

Reproducibility Record

Every experiment captures:

  • Git SHA of the codebase
  • Full dependency list (pip freeze)
  • Config files (YAML/JSON, checksummed)
  • Input data SHA-256 hashes
  • Random seeds
  • Pod hardware specs
  • Start/end timestamps

This record is immutable and attached to the experiment forever.

← Back to docs index