hubify experiment

Manage the full experiment lifecycle: creation, execution, monitoring, and result retrieval.

Commands

`hubify experiment run`

Run a new experiment. Accepts natural language or a config file:

# Natural language
hubify experiment run "MCMC chain, 10K samples, Planck+BAO, H100 pod"

# From config file
hubify experiment run --file experiment.yaml

# With explicit options
hubify experiment run \
  --name "planck-bao-chain" \
  --script run_cobaya.py \
  --config planck_bao.yaml \
  --pod h100 \
  --timeout 4h

`hubify experiment list`

List experiments in the active lab:

hubify experiment list

ID        NAME                STATUS     POD     DURATION  QC
EXP-054   planck-bao-chain    complete   h100    2h 14m    PASS
EXP-053   act-anomaly-sweep   running    h200    1h 03m    -
EXP-052   sdss-cross-match    complete   h100    45m       PASS
EXP-051   test-convergence    failed     h100    12m       FAIL

Option	Description
`--status <s>`	Filter by status: `queued`, `running`, `complete`, `failed`
`--limit <n>`	Number of results (default: 20)
`--all`	Show all experiments

`hubify experiment status`

Get detailed status of a specific experiment:

hubify experiment status EXP-054

ID:          EXP-054
Name:        planck-bao-chain
Status:      complete
Pod:         h100-abc123
Started:     2026-04-14 10:42:01 UTC
Completed:   2026-04-14 12:56:15 UTC
Duration:    2h 14m
QC:          PASS (R-hat: 1.03, samples: 10,241)
Outputs:     chain_samples.txt, posterior_plot.png, qc_report.json
Cost:        $4.28

`hubify experiment outputs`

Download experiment outputs:

# Download all outputs
hubify experiment outputs EXP-054 --download ./results/

# List outputs without downloading
hubify experiment outputs EXP-054 --list

# Download a specific file
hubify experiment outputs EXP-054 --file posterior_plot.png --download ./

`hubify experiment rerun`

Rerun a completed or failed experiment:

# Rerun with same config
hubify experiment rerun EXP-051

# Rerun with modified parameters
hubify experiment rerun EXP-051 --override "pod=h200,timeout=8h"

`hubify experiment resume`

Resume an experiment from the last checkpoint:

hubify experiment resume EXP-051 --from-checkpoint latest

`hubify experiment stop`

Stop a running experiment:

hubify experiment stop EXP-053

`hubify experiment qc`

View QC gate results:

hubify experiment qc EXP-054

QC Gate: PASS
  Convergence (R-hat):   1.03 (threshold: 1.10) ✓
  Minimum samples:       10,241 (threshold: 1,000) ✓
  Chain completeness:    100% ✓
  NaN/Inf check:         Clean ✓

Experiment Config Format

# experiment.yaml
name: "planck-bao-chain"
description: "Full MCMC chain on Planck+BAO likelihood"
script: run_cobaya.py
config: planck_bao.yaml
pod:
  gpu: h100
  timeout: 4h
  storage: 20GB
outputs:
  - chain_samples.txt
  - posterior_plot.png
qc:
  convergence_threshold: 1.10
  min_samples: 5000
depends_on:
  - EXP-050  # Must complete first

Examples

# Run and follow logs in real time
hubify experiment run --file chain.yaml && hubify logs EXP-055 --follow

# List all failed experiments this week
hubify experiment list --status failed --since 7d

# Batch rerun all failed experiments
hubify experiment list --status failed --json | jq -r '.[].id' | xargs -I{} hubify experiment rerun {}

​hubify experiment

​Commands

​hubify experiment run

​hubify experiment list

​hubify experiment status

​hubify experiment outputs

​hubify experiment rerun

​hubify experiment resume

​hubify experiment stop

​hubify experiment qc

​Experiment Config Format

​Examples