Hive Trust · Live Benchmarks

Every claim signed.
Every benchmark reproducible.

We test Hive primitives head to head against published SOTA rivals, score them on real datasets, and sign every result with cryptography. No marketing screenshots. Just receipts.

View live benchmarks Methodology + reproducibility

Protected or Pending by Hive ColonyIP

Production Corpus Results: Enterprise CASB Workload

SMSH v5 + smshPQMax: real workload compression

Corpus-tested across four enterprise CASB and agent workload types. Numbers are measured, not modeled. Invariant recall clears the 99.5% bar required for court-admissible receipts. Amber = preliminary; signed receipt on every run.

Workload	v1 Baseline	v5 Registry+Neural	Multiplier
Enterprise CASB / policy prompts	1.04x	9.15x	8.8x lift
Verbose / filler-heavy prompts	1.14x	5.09x	4.5x lift
RAG repetitive context	1.58x	3.41x	2.2x lift
Overall mean	~1.2x	5.49x	4.6x lift

Invariant Recall

99.78%

clears 99.5% court-admissible bar

p50 Latency

<25ms

280x faster than GPU-based baselines

Signing

100%

ML-DSA-65 receipt on every corpus run

Corpus results come from production workload testing. This is not a head-to-head SOTA benchmark. See the signed benchmark cards below for peer-reviewed comparisons against rivals. smshPQMax product page →

Live benchmarks

Every record below is a signed Ed25519 receipt from hivemorph. Click any card for the full methodology.

Inference primitives: production workloads with paying customers

Other primitives: trust infrastructure

Voice primitives: STT/TTS compression and tamper detection

Earned badges

Two earnable stamps. Hive Verified is awarded to any primitive that emits an Ed25519 signed receipt. Hive Platinum is awarded only when the trust record is publishable (n ≥ 500, |d| ≥ 0.3, p < 0.01).

Hive Verified · earned by

Hive Platinum · earned by

How we benchmark

Four non-negotiable rules applied uniformly across every primitive, every adversary, every dataset.

Step 01

Pick the published SOTA

We do not compare against straw men. Adversaries are the highest-citation published baseline for each task: LLMLingua-2 for compression, NIST FIPS-204 for signatures, Llama-Guard for safety, self-consistency CoT for reasoning, DSPy for prompt compilation, Constitutional AI for factuality.

Step 02

Ensemble construction

Hive v2 primitives are ensembles. Each one includes the SOTA rival itself as one candidate, plus 3 to 4 Hive-specific strategies. A quality check picks the winner for each input. That setup means the ensemble can never lose to the rival alone.

Step 03

Pre-registered evaluation

We commit to the dataset, sample size, metric, and decision criteria before running the benchmark. Pre-registration is published at github.com/srotzin/xcalibur-evaluation.

Step 04

Cryptographic receipts

Every result line, every paired t-statistic, every Cohen's d is committed in a signed Ed25519 receipt. Receipts are queryable at /v1/trust/benchmarks on receipts.thehiveryiq.com. Change any field and the signature breaks. There are no editable marketing slides.

Result status

Every benchmark record carries one of three statuses. Status is computed from the data, not editorially assigned.

Publishable

n ≥ 500, Cohen's d ≥ 0.3, p < 0.01. Ready for public claim.

Preliminary

n meets minimum but effect size or p-value below publishable bar. Honest in-progress.

Match

The Hive primitive matches the rival on correctness within the latency budget. It is a match, not a win, but it is still a real result.

Verify any receipt

Every record is public. You do not need an account or an API key.

curl -sS https://receipts.thehiveryiq.com/v1/trust/benchmarks/{record_id}

Every field is signed. Re-derive the signature against the record's pubkey_hex. If you can verify it, the record is authentic.

Open the verifier →

Every claim signed.Every benchmark reproducible.