Knowledge Base

Every lab has a Karpathy-style structured wiki that grows automatically as agents research, experiment, and discover.

The knowledge base is a structured wiki that serves as the lab's institutional memory. Unlike unstructured notes, it uses a typed schema so that agents (and you) can query, cross-reference, and build on accumulated knowledge.

Why a Knowledge Base?

Research generates a vast amount of context: parameter definitions, dataset properties, method descriptions, comparison results, and theoretical constraints. Without structure, this context gets lost in chat logs and notebooks.

The knowledge base ensures that:

  • Nothing is forgotten — every finding is recorded
  • Context compounds — agents reference prior work, not just the current task
  • Onboarding is instant — new agents (or collaborators) can read the wiki to get up to speed
  • Papers draw from a single source of truth — claims link to wiki entities

Entity Types

The wiki uses four core entity types, inspired by Andrej Karpathy's knowledge management approach:

  • Entities — Concrete objects: surveys, instruments, datasets, software packages. Each has properties, relationships, and provenance.

  • Concepts — Theories, methods, parameters, and equations. Includes definitions, derivations, and links to relevant experiments.

  • Sources — Papers, datasets, catalogs, and external references. Full citation info, DOIs, and notes on relevance.

  • Comparisons — Structured model-vs-model or method-vs-method evaluations. Each comparison has criteria, evidence, and a verdict.

Schema

Each entity follows a typed schema:

# Example: Entity
type: entity
name: "DESI DR1"
category: survey
properties:
  spectra_count: 22_500_000
  anomaly_rate: 0.87%
  release_date: "2024-06-14"
relationships:
  - type: analyzed_by
    target: "DESI Anomaly Pipeline"
  - type: cross_referenced_with
    target: "SDSS DR18"
tags: ["spectroscopy", "dark-energy", "anomaly-detection"]
# Example: Concept
type: concept
name: "Non-Gaussianity (f_NL)"
definition: "Amplitude of the local-type primordial bispectrum"
equation: "f_NL = -35/8 = -4.375 (matter bounce)"
related_experiments: ["EXP-031", "EXP-048"]
related_papers: ["paper-1", "paper-2"]

Automatic Growth

Agents update the knowledge base as they work:

  • After an experiment completes: New findings, parameters, and figures are added
  • After a paper review: Corrections and clarifications update existing entries
  • After a literature search: New sources are cataloged with relevance notes
  • After a cross-survey analysis: Comparisons are created or updated

You do not need to manually maintain the wiki. It grows organically as research progresses.

Querying

Search the knowledge base by type, tag, or free text:

# Search by keyword
hubify knowledge search "f_NL"

# List all entities of a type
hubify knowledge list --type concept

# Get a specific entry
hubify knowledge get "DESI DR1"

# List recently updated entries
hubify knowledge recent --limit 10

API

curl https://api.hubify.com/v1/labs/bigbounce/knowledge?q=f_NL \
  -H "Authorization: Bearer $HUBIFY_API_KEY"

Returns matching entries with full metadata, relationships, and linked experiments.

Knowledge in Papers

When an agent writes a paper section, it queries the knowledge base for relevant entities, concepts, and sources. This ensures:

  • Correct parameter values (pulled from wiki, not hallucinated)
  • Proper citations (linked to source entries)
  • Consistent terminology (defined in concept entries)

The claims table in every paper links back to knowledge base entries as evidence.

← Back to docs index