6.1x faster on simple queries. 98.5% cheaper at scale. Every sub-step cryptographically signed before the output moves. Patent pending.
Same cryptographic receipt either way — ML-DSA-65 (NIST FIPS 204) + Ed25519, verifiable offline or by URL. The only question is whether AFiR runs the inference, or simply attests inference you ran yourself.
Send a prompt to /v1/afir/run. AFiR fragments, routes, and signs every sub-step before the output moves. Fastest path to faster, cheaper, signed inference.
Already running your own model on your own stack? Hand us {input, output} at /v1/afir/sign. Zero routing, zero model change — the output becomes signed inference.
Type any question below. Watch AFIR fragment, route, and attest in parallel — side by side with standard inference.
Every AFIR response bundles a tamper-evident receipt containing signed attestations for each fragment and a Merkle root over the full response tree.
Every AFIR receipt carries an smsh field — a cryptographic seal over the exact reasoning state (system prompt, context window, policy snapshot, identity) that was active when each fragment was signed. Without it, you can prove the output was signed. With it, you can prove the reasoning that authorized it. AFIR ships three SMSH tiers.
Drop AFIR into any OpenAI-compatible workflow with a single base URL swap. Native endpoint also available.
Works with Anthropic, Google Gemini, Groq, Mistral, or any provider with an OpenAI-compatible /chat/completions endpoint. AFIR owns decomposition. You supply execution.
Pick the product that matches your compliance posture. All three use the same API — swap by changing one header.
Pick your current model and your AFIR tier models. The math uses real published rates. See exactly what you save — or don't.
Illustrative model. Rates from published pricing as of June 2026. Actual savings vary by prompt structure, cache hit rate, and tier spread. Attestation overhead is 0.785ms/fragment (ML-DSA-65, NIST FIPS 204). Latency uses parallel DAG execution — simple queries gate-bypass (single fragment, cheap model), complex queries run in parallel waves. Critical path shown, not total sequential time.
Latency and cost reflect your selected model and complexity. Receipts and signing are fixed per tier.
Enter your Together, Fireworks, or custom provider numbers. See the signed-tier margin and latency gap side by side.
Open Inference Comparator →The objections engineers and legal teams raise. Answered with numbers, not marketing.
Latency reduction depends on query type. Simple queries bypass decomposition entirely (gate bypass) and complete in 352ms vs 1,692ms for a monolithic GPT-4.1 call — 4.8x faster. Complex queries run fragments in parallel; wall-clock time tracks the critical path, not total token count. In live benchmarks with Cerebras direct routing, gate bypass completes in 241ms vs 867ms monolithic — 3.6x faster. Complex multi-fragment queries: 5,955ms vs 10,628ms — 1.78x faster, 98.5% cheaper. Cheaper models account for cost savings; parallelism accounts for latency savings. These are orthogonal gains. The tiered routing cuts the bill; the DAG execution cuts the latency.
You could build a proxy with a hash in a weekend. What takes longer: a correct DAG decomposition engine that preserves semantic dependency ordering across arbitrary prompts, an ML-DSA-65 (NIST FIPS 204) attestation chain at 0.8ms per-fragment overhead at inference latency scales, a Merkle completeness proof that binds input state, routing decisions, and output hashes into a single verifiable receipt, and a key-split architecture where your signing key never leaves your perimeter. The engineering surface is the DAG correctness guarantees and the attestation chain integrity — not the proxy layer. Build it and run it in production under audit; that's the actual weekend estimate.
Signing outputs at the final response layer is not new. What is patent-pending (filed June 2026) is the combination of fragment-level attestation across a routed DAG — specifically, attesting each node before its output is consumed as input by a downstream node, so the chain of custody is continuous and not retrospective. Prior art signs the envelope; AFIR signs every edge in the dependency graph mid-execution. If you have specific prior art that covers per-node attestation within a runtime inference DAG with Merkle assembly proofs, file it against the application — that is the correct venue.
The signed receipt captures the exact input state, the routing decision, the full dependency graph, and the output hash at assembly time. If assembly produces an incorrect result, the receipt is forensic evidence of exactly which node produced which output under which routing decision — that is the point of the Merkle completeness proof. Liability follows the evidence: if the decomposition logic is wrong, that is traceable to the DAG construction step in the receipt; if a model tier returns a defective fragment, that is attested at that node. Hive provides the receipt infrastructure and the routing logic; the customer's signed key binds them to the input state they submitted. The receipt does not resolve liability by itself — it makes the facts unambiguous.
The key split architecture is designed for this constraint: the customer holds the ML-DSA-65 signing key on-premises, and Hive holds only the verification root. Inference fragments transit Hive's routing layer, but the signing authority never leaves the customer's perimeter — Hive cannot forge a receipt the customer did not authorize. If your legal requirement is that no inference payload leaves your network, AFIR is not the right fit in its current hosted form. If the requirement is that a third party cannot produce valid attested outputs without your authorization, the key split satisfies that. Bring the receipt architecture spec to your legal team against those two specific threat models.
The EU AI Act's current logging obligations for high-risk systems do not mandate per-fragment attestation or Merkle assembly proofs. AFIR's receipt format exceeds those requirements by design, not by regulatory necessity. The value proposition is not compliance box-checking — it is that when a regulator, customer, or internal audit asks exactly what inputs produced exactly what output under exactly what model routing decision, you produce a cryptographically verifiable answer rather than reconstructed logs. Regulations are a floor; your exposure in a dispute is the ceiling. The receipt is for the ceiling.
Audit logs record what your system observed; receipts attest what the inference system executed. Your logs can be amended, retroactively structured, or missing entries due to pipeline failures — they are assertions your system makes about itself. An AFIR receipt is signed at execution time by a key you hold, binds the input hash, the routing graph, and the output hash into a Merkle structure, and cannot be back-filled without invalidating the signature. The distinction matters when a counterparty — a regulator, a plaintiff, an enterprise customer — challenges whether a specific output came from a specific input under a specific model. A log entry is testimony; a receipt is evidence.
The receipt's verifiability depends on the ML-DSA-65 key pair, not on Hive's operational continuity. The customer holds the signing key; the verification root is a public key that can be exported, archived, and verified offline with any standard FIPS 204-compliant implementation. If Hive ceases operations, existing receipts remain verifiable against the public key the customer already holds — no Hive infrastructure required. New receipt issuance would stop, but the forensic value of issued receipts does not decay. The escrow and key export procedures are documented in the enterprise agreement for exactly this scenario.
No fragments leave your infrastructure. No data touches our servers. You operate the container. We license the IP and hold the verification root.
Available to Hyperscaler tier customers. Submit your request and we will follow up with deployment specs, image delivery, and licensing terms.
One checkout. Your live key appears on screen the moment payment clears — no email round-trip, no waiting. Metered billing: you only pay for what you sign.