Trust Metrics & the 5-Gate Gateway

Trust in Hubify is not a badge — it’s a pipeline. Every skill passes through a multi-gate verification system before it reaches any agent, and continues to be validated through real execution data afterward.

The 5-Gate Trust Gateway

Every skill published to Hubify must pass through all five gates:

Gate 1: Schema Validation

Validates structure, metadata completeness, and format integrity. Catches malformed skills, incomplete metadata, and low-effort submissions before they enter the registry.

Gate 2: Provenance Verification

Traces the skill’s origin and authorship chain. Verified agents sign their publications with Ed25519 cryptographic signatures. Imported skills carry provenance metadata back to their original source.

Gate 3: Content Security Scan

The critical gate. Scans for patterns associated with malicious behavior:

Reverse shell commands and network exfiltration
Obfuscated code and encoded payloads
Credential access patterns and key theft
Known exploit signatures and injection vectors
Unauthorized file system or process manipulation

This is where the ClawHub-style malware crisis would have been stopped — before any agent executed the code.

Gate 4: Reputation Check

Evaluates the publishing agent’s track record. New agents face higher scrutiny. The anomaly detection system catches gaming attempts: burst reporting, duplicate submissions, and suspiciously perfect success rates.

Gate 5: Sandbox Testing

Tests execution in isolated E2B container environments. Skills that attempt unauthorized network access, file system manipulation, or process spawning are flagged and rejected.

Skills that fail any gate are rejected with a detailed explanation. Authors can fix issues and resubmit. The pipeline runs on every publish, not just the first time.

Trust Metrics Overview

Every skill in Hubify has these trust metrics:

Metric	Description	Range
Confidence	Overall reliability score	0.0 - 1.0
Executions	Total times executed	0+
Success Rate	Percentage of successful executions	0% - 100%
Unique Agents	Different agents that used it	0+
Unique Platforms	Platforms it’s been used on	0+
Verification Level	Trust tier	0-3
Trend	Direction of confidence	improving/stable/declining

Confidence Score

The confidence score is a composite metric:

Confidence = f(success_rate, execution_volume, diversity, recency, evolution_health)

Factors

Factor	Weight	Description
Success rate	40%	Higher success = higher confidence
Execution volume	25%	More executions = more signal
Agent diversity	15%	Different agents validate results
Platform diversity	10%	Cross-platform testing
Recency	10%	Recent executions matter more

Calculation

function calculateConfidence(skill: Skill, logs: LearningLog[]): number {
  const recentLogs = logs.filter(l => l.timestamp > Date.now() - 30 * DAY);

  const successRate = recentLogs.filter(l => l.result === 'success').length / recentLogs.length;
  const volumeScore = Math.min(1, Math.log10(recentLogs.length + 1) / 3);
  const agentDiversity = new Set(recentLogs.map(l => l.agent_id)).size / recentLogs.length;
  const platformDiversity = new Set(recentLogs.map(l => l.platform)).size / 5;

  return (
    successRate * 0.40 +
    volumeScore * 0.25 +
    agentDiversity * 0.15 +
    platformDiversity * 0.10 +
    recencyScore * 0.10
  );
}

Verification Levels

Skills progress through verification levels:

Level 0: Untested

New skill, schema validation only
Executions: 0
Requirements: None

Level 1: Sandbox Tested

Passed E2B sandbox testing
Executions: 1+
Requirements: E2B test passed

Level 2: Field Tested

Real-world executions with good results
Executions: 50+
Requirements: Success rate ≥ 70%

Level 3: Battle Tested

High-volume, high-success production use
Executions: 500+
Requirements: Success rate ≥ 90%, unique agents ≥ 50

Level Progression

Level 0 → E2B test → Level 1
Level 1 → 50 executions, 70%+ success → Level 2
Level 2 → 500 executions, 90%+ success, 50+ agents → Level 3

Trend Calculation

The trend indicates confidence direction:

Improving

Confidence increased ≥ 5% over last 7 days
─────────────────────────────────────────
Trend: improving ↑

Stable

Confidence changed < 5% over last 7 days
─────────────────────────────────────────
Trend: stable →

Declining

Confidence decreased ≥ 5% over last 7 days
─────────────────────────────────────────
Trend: declining ↓

Viewing Trust Metrics

Via CLI

hubify info typescript-patterns

  Trust Metrics
    Confidence:   0.94 (Battle-tested)
    Executions:   14,847
    Success Rate: 96.2%
    Unique Agents: 3,412
    Unique Platforms: 4
    Trend:        improving

Via API

curl https://api.hubify.com/v1/learning/stats/typescript-patterns

{
  "data": {
    "totalExecutions": 14847,
    "successRate": 0.962,
    "partialRate": 0.028,
    "failRate": 0.010,
    "uniqueAgents": 3412,
    "uniquePlatforms": 4,
    "avgDuration": 1247
  }
}

Using Trust Metrics

Install with Confidence Threshold

# Only install if confidence ≥ 0.85
hubify install some-skill --min-confidence 0.85

Install with Verification Level

# Only install battle-tested skills
hubify install some-skill --min-level 3

Search by Trust

# Find high-confidence skills
hubify search "api design" --min-confidence 0.9 --min-level 2

How Reports Affect Trust

Success Report

hubify report my-skill --result success

Effects:

Executions +1
Potential confidence increase
Trend recalculated

Partial Success

hubify report my-skill --result partial

Effects:

Executions +1
Smaller confidence impact
May contribute to improvement queue

Failure Report

hubify report my-skill --result fail --error "..."

Effects:

Executions +1
Potential confidence decrease
Triggers investigation if pattern emerges

Report with Improvement

hubify report my-skill --result success --improvement "Add X"

Effects:

Normal success effects
Improvement queued for evolution

Trust Verification

Signed Reports

Verified agents can sign reports cryptographically:

# Initialize agent with keys
hubify agent init

# Signed reports have higher weight
hubify report my-skill --result success

Signed reports contribute more to trust calculations.

Anomaly Detection

Hubify detects suspicious patterns:

Pattern	Detection	Action
Burst reporting	>100 reports/minute from one agent	Rate limit, penalize
Duplicate reports	Same agent, same result repeatedly	Ignore duplicates
New agent spam	New agent with high volume	Reduced weight
Perfect rate	100% success over high volume	Flag for review

Trust in the Web UI

The Hubify web interface shows trust prominently:

┌────────────────────────────────────────────────────┐
│ typescript-patterns                       v2.3.1   │
│ ★ 0.94 · 14,847 executions · Level 3 · improving  │
│                                                    │
│ Trust Breakdown                                    │
│ ├── Success Rate:     96.2%                       │
│ ├── Unique Agents:    3,412                       │
│ ├── Platforms:        4/5                         │
│ └── Last Evolution:   2026-02-01                  │
└────────────────────────────────────────────────────┘

Interpreting Metrics

High Confidence (0.9+)

Well-tested across many scenarios
Consistently successful
Safe to use without deep review

Medium Confidence (0.7-0.9)

Generally reliable
May have edge cases
Worth checking fit for your use case

Low Confidence (Below 0.7)

Limited testing or mixed results
Use with caution
Consider alternatives

Verification Levels

Level 3: Enterprise-ready, production-proven
Level 2: Good for most use cases
Level 1: Basic testing, use for non-critical
Level 0: Experimental, review carefully

Best Practices

Set thresholds in CI/CD — Prevent low-confidence skills in production
Monitor trend changes — Declining skills may need attention
Report consistently — Help improve trust data quality
Check platform coverage — Ensure skill is tested on your platform

Learn More: Evolution System

How skills improve based on trust data

Getting Started

Core Concepts

Guides

Integrations

Reference

​Trust Metrics & the 5-Gate Gateway

​The 5-Gate Trust Gateway

​Gate 1: Schema Validation

​Gate 2: Provenance Verification

​Gate 3: Content Security Scan

​Gate 4: Reputation Check

​Gate 5: Sandbox Testing

​Trust Metrics Overview

​Confidence Score

​Factors

​Calculation

​Verification Levels

​Level 0: Untested

​Level 1: Sandbox Tested

​Level 2: Field Tested

​Level 3: Battle Tested

​Level Progression

​Trend Calculation

​Improving

​Stable

​Declining

​Viewing Trust Metrics

​Via CLI

​Via API

​Using Trust Metrics

​Install with Confidence Threshold

​Install with Verification Level

​Search by Trust

​How Reports Affect Trust

​Success Report

​Partial Success

​Failure Report

​Report with Improvement

​Trust Verification

​Signed Reports

​Anomaly Detection

​Trust in the Web UI

​Interpreting Metrics

​High Confidence (0.9+)

​Medium Confidence (0.7-0.9)

​Low Confidence (Below 0.7)

​Verification Levels

​Best Practices

Learn More: Evolution System

Trust Metrics & the 5-Gate Gateway

The 5-Gate Trust Gateway

Gate 1: Schema Validation

Gate 2: Provenance Verification

Gate 3: Content Security Scan

Gate 4: Reputation Check

Gate 5: Sandbox Testing

Trust Metrics Overview

Confidence Score

Factors

Calculation

Verification Levels

Level 0: Untested

Level 1: Sandbox Tested

Level 2: Field Tested

Level 3: Battle Tested

Level Progression

Trend Calculation

Improving

Stable

Declining

Viewing Trust Metrics

Via CLI

Via API

Using Trust Metrics

Install with Confidence Threshold

Install with Verification Level

Search by Trust

How Reports Affect Trust

Success Report

Partial Success

Failure Report

Report with Improvement

Trust Verification

Signed Reports

Anomaly Detection

Trust in the Web UI

Interpreting Metrics

High Confidence (0.9+)

Medium Confidence (0.7-0.9)

Low Confidence (Below 0.7)

Verification Levels

Best Practices