fabius
Reads the job, sets the stance, dispatches by layer + machinery + model-tier. The conductor and the kill-switch.
fabius is the agent. One stance across the whole job — write code, design UI, build and orchestrate agents, debug, remember, ship on-chain — then strike with the smallest correct artifact. It runs on every major model and is managed from one console. Investigate everything. Ship one thing.
▶ every model — Anthropic · OpenAI · Google · Mistral · Groq
Watch · the 25-second explainer
One prompt, routed by layer to one of fifteen coordinated skills — then the blind, reproducible benchmark that backs it. The system explaining itself, end to end.
01The one idea
One axis dissolves the old tension between “be thorough” and “be minimal.” Fan out to understand and verify — cheap to investigate, expensive to be wrong. Then ship the single smallest correct thing, and say it in the fewest words. Process and memory make you wide. Lean makes you narrow. They never conflict — they live on different axes.
Named for Quintus Fabius Maximus — who beat Hannibal by refusing every battle that didn’t matter, and committing fully to the one that did.
02What changes
One concentrated set of operating rules changes what the agent ships across every kind of work — without you re-explaining yourself each time.
03Architecture
Your prompt hits the router, which dispatches by layer, machinery, and model-tier. Process decides how; domain decides what; the lean core runs beneath everything. Each rule has exactly one owner — that’s what keeps fifteen skills from contradicting one another. The agent runs the job as an autonomous loop — Scout → Plan → Strike → Prove → Record — read-only by default, acting only when you opt in.
04The fifteen skills
A router, an always-on lean core, and thirteen engineering specialists. Each is a thin operating contract; the depth loads only on demand.
Reads the job, sets the stance, dispatches by layer + machinery + model-tier. The conductor and the kill-switch.
The lean core. Terse output, the YAGNI ladder, surgical changes — never speculative scope. Runs underneath everything.
Brainstorm → plan → test-first → prove. Grill ambiguity one question at a time. Root-cause debugging, not symptom patches.
Ship-grade UI and data-viz. One-accent laws, a token vocabulary, mobile-first, live-verify, data-ink charts.
Define and orchestrate other agents. A definition schema, least privilege, five orchestration patterns including the swarm.
Persistent, per-project memory. A linked markdown knowledge base of index + log + topic pages, tended without being asked. Stop re-deriving.
Positioning, message-to-awareness match, proof over adjectives, a one-action funnel, copy that converts.
Defensive security. STRIDE per boundary, the OWASP pass, secrets + least privilege, severity → fix → proof findings.
The core loop first, deliberate juice, state as a machine, the pixel lane, jam-sized scope.
On-chain dev (EVM + Solana) and verifiable provenance. Account-validation-first, money-safe transactions, cryptographic sealing — hash → sign → anchor → verify.
Deterministic workflow automation. Discover from the live schema, build incrementally, validate and verify the wiring, then activate — no silent miswires.
The scientific method, executable. Competing falsifiable hypotheses, source-grounded database lookups, reproducible field-standard pipelines.
The model lifecycle as production software. Train → evaluate → serve → monitor. Held-out leakage-free evals, blind judges, vLLM-class serving, MLOps that's reproducible.
Method over money. Risk sized first, evidence over narrative, valuation with stated assumptions, backtests proven out-of-sample. Analysis, not advice — never manipulation.
Convene a council of models on one question — each answers, then ranks the others blind, and a chairman synthesizes the field into one better answer. Ensemble epistemics, spent only when a wrong answer is costly.
parcus, frugal · disciplina, training · decor, what is fitting · cohors, the cohort · archivum, the record office · mercatus, the marketplace · praesidium, the garrison · ludus, the game · catena, the chain · machina, the working mechanism · scientia, knowledge won by method · doctrina, the body of learning · fortuna, the turn of the market · concilium, the summoned council.
FABIUS · CENTRAL INTELLIGENCE UNIT
scout wide · strike narrow
One stance handed to the agent. It investigates the whole field, then commits — fully — to the single battle that decides the war.
05The proof
One benchmark, three panels. Fifteen tasks, three arms — baseline · a generic “be concise” control (the real test) · the shipped fabius files, verbatim — blind-judged on the four newest Claude models by two judges never told which arm wrote which. Then verified objectively, with no judge at all: the generated code executed against hidden test suites, the facts checked against fixed checklists. Then demoed blind across external model families. Beating the control, not the baseline, is the real test.
Panel A of the benchmark — the shipped AGENTS.md stance and each routed specialist’s SKILL.md, byte-for-byte, on the four newest Claude models: 15 tasks × 3 arms, each answer scored by two blind judges (inter-judge gap 0.72/15). Output drops 20–34% on every model. Every capable tier — Fable 5, Sonnet 5 and Opus 4.8 — beats both the bare model and the “be concise” control while doing it, and Sonnet 5’s +0.43 is the largest lift of the four. Haiku is the one dip, and it splits cleanly: +0.71 on its 12 specialist tasks, −4.50 on 3 trivial one-liners — the full contracts overwhelming a fast model on trivial work, exactly what the lean gate and model-tier routing exist to prevent, and why the fast tier gets a condensed stance.
Panel B of the benchmark — no judge at all: 9 deliverables × 3 arms × the same four models, graded by two strict graders. Generated code is written to a file and run against a hidden test suite; research, security, on-chain and automation deliverables are graded against a fixed factual checklist (is the SQL query parameterized? is FDR correction applied? is the token account validated? is the handler idempotent?). Pure algorithm code is already at ~100% for every bare model — no headroom — so fabius holds it there. The lift is on the deliverables where looks right and is right diverge: parameterized SQL 67.5% → 100%, idempotent webhook 40% → 67.5%, Solana account validation 47.5% → 57.5%. Every one of the four models gains objectively (+3.5 to +17.4), the biggest on the smallest model — Haiku +17.4 — because its bare defaults skip the most guardrails. It fixes real bugs a quality score never sees, on 12–25% less output.
Panel C of the benchmark — blind demos on external model families, run on a portable harness with live provider keys (6 tasks, measured 2026-06-22). Quality lift (fabius − the “be concise” control) on genuine-build tasks, out of 15: every family gains — Grok +8.5, GPT +7.0, Claude +7.0, Mistral +2.5. Largest where it matters — real builds and trust boundaries; near zero on pure YAGNI, by design. Gemini is wired; no number is printed without a key.
One benchmark, three panels, one pattern: fabius improves every model it runs on — on 20–35% less output. Blind-judged on the newest Claude models, structure beats brevity; verified objectively, with no judge at all, it passes more executed tests and factual checks; demoed across external families, every family gains. Not “smarter,” not “10× on everything” — a scope-control system that knows when to compress and when to expand. And it’s built right: a deterministic suite verifies the system — fifteen single-owner skills, every reference live, the content-bound seal recomputable — at 19/19.
Method, mechanism & caveats — in the whitepaper →06Grounded in research
fabius’s routing is drawn from the agent-research canon — ReAct, Toolformer, Tree of Thoughts, Reflexion, MemGPT, DSPy, Voyager and the 2026 efficiency surveys — turned into a documented decision policy: eighteen rules, each tagged for what the papers measured versus what fabius borrows by analogy.
inline → tool → retrieval → plan → subagent → swarm), never jump to a swarm.Conceptual — the shape of a documented principle (capability scales sub-linearly with machinery), not a fabius measurement.
07The decision
Add a tool? Spawn a reviewer? Branch wider? Retry once more? Under the hood fabius asks one question — does the expected loss removed beat the cost of the machinery? That single threshold governs all eighteen routing rules (each adversarially verified, the set proven to cohere). Scout wide, strike narrow — as arithmetic.
The gap on the left is the value of information. Below the threshold the machinery is pure overhead; fabius stays inline. (rules R3 · R7 · M1 · M4 — one object, different machinery.)
Worked example — should fabius add a reviewer?
18 points of loss removed — more than one agent costs. So fabius spawns the reviewer: two decorrelated passes beat one. (rule M1)
08Every model
fabius runs on every major provider from one console — and picks a tier per task: frontier when the job is hard, mid or fast when it isn’t. Bring your own key; the agent adapts to whatever you hand it.
Run and manage fabius from the synapse console — hand it a task, pick a model tier, and watch the loop: Scout → Plan → Strike → Prove → Record. Every run is logged, costed, and reviewable.
1 open the console — hand fabius a task
2 it routes, runs on your chosen model, self-verifies, records
Private, cryptographically sealed brain · runs on every major model · every run logged and reviewable.
09Questions
An autonomous engineering agent. One stance — scout wide, strike narrow — across the whole job: code, design, agents, debugging, memory, marketing, defensive security, game craft, on-chain development and sealing, automation, science, ML engineering, markets & finance, and convening a cross-model council to deliberate one answer. Fifteen coordinated, non-overlapping skills form its brain, and it runs on every major model.
You run and manage fabius from the synapse console: hand it a task, pick a model tier, and it routes → plans → strikes with the smallest correct artifact → self-verifies → records. Every run is logged and reviewable.
Every major provider — Anthropic, OpenAI, Google, Mistral and Groq — at a frontier, mid or fast tier per task. Bring your own key and the agent adapts. Its benchmark covers the four newest Claude models — blind-judged and objectively verified, the code executed and the facts checked — with external-model demos on GPT, Grok and Mistral.
No. fabius’s brain — the fifteen skills, the routing policy, the memory — is private and cryptographically sealed (SHA-256 + a Merkle root anchored to Bitcoin). The seal is proof of authorship if anyone ever copies it.
No — it is a scope-control system, not a capability boost. In its benchmark — blind, judged against both a bare baseline and a generic “be concise” control — every capable tier (Fable 5, Sonnet 5, Opus 4.8) beats both controls on the four newest Claude models. And it holds when verified objectively — code run against hidden tests, facts checked: under fabius the smallest model passes 93% of real tests and factual checks vs 75.6% bare, and parameterized SQL goes 67.5% → 100%. All of it on 20–35% less output. Honest framing: structure beats brevity — not smarter, not 10× on everything; every capable tier gains on both panels and the fast tier dips on trivial one-liners, printed as-is.
That is exactly the control it was tested against — and it wins. fabius is structure, not brevity: a scope-control system that knows when to compress and when to expand. Method and caveats are in the whitepaper.