One agent · every model · fifteen coordinated skills

scout wide. strike narrow.

fabius is the agent. One stance across the whole job — write code, design UI, build and orchestrate agents, debug, remember, ship on-chain — then strike with the smallest correct artifact. It runs on every major model and is managed from one console. Investigate everything. Ship one thing.

every model — Anthropic · OpenAI · Google · Mistral · Groq
Autonomous agent Anthropic · OpenAI · Google · Mistral · Groq One benchmark — blind + objective · 19/19 structural Whitepaper · proofs Private · sealed

Watch · the 25-second explainer

See the whole system work.

One prompt, routed by layer to one of fifteen coordinated skills — then the blind, reproducible benchmark that backs it. The system explaining itself, end to end.

01The one idea

Scout wide in what you investigate.
Strike narrow in what you ship.

One axis dissolves the old tension between “be thorough” and “be minimal.” Fan out to understand and verify — cheap to investigate, expensive to be wrong. Then ship the single smallest correct thing, and say it in the fewest words. Process and memory make you wide. Lean makes you narrow. They never conflict — they live on different axes.

Named for Quintus Fabius Maximus — who beat Hannibal by refusing every battle that didn’t matter, and committing fully to the one that did.

02What changes

Same model. A different shape of output.

One concentrated set of operating rules changes what the agent ships across every kind of work — without you re-explaining yourself each time.

You ask for Typical model default Under fabius
CodeVerbose, over-engineered, may skip validationMinimal and surgical — validation and security kept
A bug fixPatches the symptomReproduce → root cause → regression test
UIInline styles, inconsistent, desktop-firstDesign tokens, one accent, mobile-first, accessible
An agentBroad tools, vague roleLeast privilege, a precise output contract
An explanationPadded, hedgedTight, exact, no filler
Research / memoryRe-derives every sessionWritten down once, retrieved the next time

03Architecture

One router. A lean core. Thirteen specialists — fifteen skills in all.

Your prompt hits the router, which dispatches by layer, machinery, and model-tier. Process decides how; domain decides what; the lean core runs beneath everything. Each rule has exactly one owner — that’s what keeps fifteen skills from contradicting one another. The agent runs the job as an autonomous loop — Scout → Plan → Strike → Prove → Record — read-only by default, acting only when you opt in.

your prompt
fabiusrouter · dispatches by layer · machinery · model-tier
disciplina decor cohors archivum mercatus praesidium ludus catena machina scientia doctrina fortuna concilium
fabius-parcus — always-on lean core, beneath every layer
the smallest correct result

04The fifteen skills

Fifteen coordinated, zero-overlap skills. One console.

A router, an always-on lean core, and thirteen engineering specialists. Each is a thin operating contract; the depth loads only on demand.

router

fabius

Reads the job, sets the stance, dispatches by layer + machinery + model-tier. The conductor and the kill-switch.

always on

fabius-parcus

The lean core. Terse output, the YAGNI ladder, surgical changes — never speculative scope. Runs underneath everything.

process

fabius-disciplina

Brainstorm → plan → test-first → prove. Grill ambiguity one question at a time. Root-cause debugging, not symptom patches.

design

fabius-decor

Ship-grade UI and data-viz. One-accent laws, a token vocabulary, mobile-first, live-verify, data-ink charts.

agents

fabius-cohors

Define and orchestrate other agents. A definition schema, least privilege, five orchestration patterns including the swarm.

memory

fabius-archivum

Persistent, per-project memory. A linked markdown knowledge base of index + log + topic pages, tended without being asked. Stop re-deriving.

go-to-market

fabius-mercatus

Positioning, message-to-awareness match, proof over adjectives, a one-action funnel, copy that converts.

security

fabius-praesidium

Defensive security. STRIDE per boundary, the OWASP pass, secrets + least privilege, severity → fix → proof findings.

game craft

fabius-ludus

The core loop first, deliberate juice, state as a machine, the pixel lane, jam-sized scope.

on-chain

fabius-catena

On-chain dev (EVM + Solana) and verifiable provenance. Account-validation-first, money-safe transactions, cryptographic sealing — hash → sign → anchor → verify.

automation

fabius-machina

Deterministic workflow automation. Discover from the live schema, build incrementally, validate and verify the wiring, then activate — no silent miswires.

science

fabius-scientia

The scientific method, executable. Competing falsifiable hypotheses, source-grounded database lookups, reproducible field-standard pipelines.

AI/ML engineering

fabius-doctrina

The model lifecycle as production software. Train → evaluate → serve → monitor. Held-out leakage-free evals, blind judges, vLLM-class serving, MLOps that's reproducible.

markets & finance

fabius-fortuna

Method over money. Risk sized first, evidence over narrative, valuation with stated assumptions, backtests proven out-of-sample. Analysis, not advice — never manipulation.

cross-model council

fabius-concilium

Convene a council of models on one question — each answers, then ranks the others blind, and a chairman synthesizes the field into one better answer. Ensemble epistemics, spent only when a wrong answer is costly.

parcus, frugal · disciplina, training · decor, what is fitting · cohors, the cohort · archivum, the record office · mercatus, the marketplace · praesidium, the garrison · ludus, the game · catena, the chain · machina, the working mechanism · scientia, knowledge won by method · doctrina, the body of learning · fortuna, the turn of the market · concilium, the summoned council.

FABIUS · CENTRAL INTELLIGENCE UNIT

scout wide · strike narrow

One stance handed to the agent. It investigates the whole field, then commits — fully — to the single battle that decides the war.

05The proof

One benchmark: fabius improves every model it runs on.

One benchmark, three panels. Fifteen tasks, three arms — baseline · a generic “be concise” control (the real test) · the shipped fabius files, verbatim — blind-judged on the four newest Claude models by two judges never told which arm wrote which. Then verified objectively, with no judge at all: the generated code executed against hidden test suites, the facts checked against fixed checklists. Then demoed blind across external model families. Beating the control, not the baseline, is the real test.

Panel A — the four newest Claude models, blind.
ModelBare modelUnder fabius
Fable 5 frontier14.50 / 1514.73 / 15  +0.23 · −25% output
Sonnet 5 mid14.07 / 1514.50 / 15  +0.43 · −34% output
Opus 4.8 frontier14.40 / 1514.60 / 15  +0.20 · −20% output
Haiku fast11.73 / 1511.40 / 15  −0.33 · −35% output

Panel A of the benchmark — the shipped AGENTS.md stance and each routed specialist’s SKILL.md, byte-for-byte, on the four newest Claude models: 15 tasks × 3 arms, each answer scored by two blind judges (inter-judge gap 0.72/15). Output drops 20–34% on every model. Every capable tier — Fable 5, Sonnet 5 and Opus 4.8 — beats both the bare model and the “be concise” control while doing it, and Sonnet 5’s +0.43 is the largest lift of the four. Haiku is the one dip, and it splits cleanly: +0.71 on its 12 specialist tasks, −4.50 on 3 trivial one-liners — the full contracts overwhelming a fast model on trivial work, exactly what the lean gate and model-tier routing exist to prevent, and why the fast tier gets a condensed stance.

Panel B — objective: the code is run, the facts are checked.
ModelBare modelUnder fabius
Haiku fast75.6%93.0%  +17.4
Opus 4.884.9%90.7%  +5.8
Sonnet 5 mid84.9%90.7%  +5.8
Fable 587.2%90.7%  +3.5

Panel B of the benchmark — no judge at all: 9 deliverables × 3 arms × the same four models, graded by two strict graders. Generated code is written to a file and run against a hidden test suite; research, security, on-chain and automation deliverables are graded against a fixed factual checklist (is the SQL query parameterized? is FDR correction applied? is the token account validated? is the handler idempotent?). Pure algorithm code is already at ~100% for every bare model — no headroom — so fabius holds it there. The lift is on the deliverables where looks right and is right diverge: parameterized SQL 67.5% → 100%, idempotent webhook 40% → 67.5%, Solana account validation 47.5% → 57.5%. Every one of the four models gains objectively (+3.5 to +17.4), the biggest on the smallest model — Haiku +17.4 — because its bare defaults skip the most guardrails. It fixes real bugs a quality score never sees, on 12–25% less output.

Panel C — external-model demos.
+0+2+4+6+8+10Grok: +8.5 vs the be-concise control (genuine-build, /15)Grok+8.5GPT: +7.0 vs the be-concise control (genuine-build, /15)GPT+7.0Claude: +7.0 vs the be-concise control (genuine-build, /15)Claude+7.0Mistral: +2.5 vs the be-concise control (genuine-build, /15)Mistral+2.5lift vs the control · /15 →

Panel C of the benchmark — blind demos on external model families, run on a portable harness with live provider keys (6 tasks, measured 2026-06-22). Quality lift (fabius − the “be concise” control) on genuine-build tasks, out of 15: every family gains — Grok +8.5, GPT +7.0, Claude +7.0, Mistral +2.5. Largest where it matters — real builds and trust boundaries; near zero on pure YAGNI, by design. Gemini is wired; no number is printed without a key.

One benchmark, three panels, one pattern: fabius improves every model it runs on — on 20–35% less output. Blind-judged on the newest Claude models, structure beats brevity; verified objectively, with no judge at all, it passes more executed tests and factual checks; demoed across external families, every family gains. Not “smarter,” not “10× on everything” — a scope-control system that knows when to compress and when to expand. And it’s built right: a deterministic suite verifies the system — fifteen single-owner skills, every reference live, the content-bound seal recomputable — at 19/19.

Method, mechanism & caveats — in the whitepaper →

06Grounded in research

Decisions, not hand-waving.

fabius’s routing is drawn from the agent-research canon — ReAct, Toolformer, Tree of Thoughts, Reflexion, MemGPT, DSPy, Voyager and the 2026 efficiency surveys — turned into a documented decision policy: eighteen rules, each tagged for what the papers measured versus what fabius borrows by analogy.

  • Climb one rung, stop at the knee. Capability scales sub-linearly with machinery — add the smallest sufficient rung (inline → tool → retrieval → plan → subagent → swarm), never jump to a swarm.
  • Refine on a real signal. A hard oracle (a test, a compiler) earns ~3 iterations; soft self-critique caps at 1–2; no signal ships once to review.
  • Proven to cohere. All eighteen rules reduce to one expected-loss / value-of-information threshold — adversarially verified, then proven gap-free.
Climb one rung, stop at the knee.
diminishing returnsfabius climbs to herethe kneeinlinetoolretrievalplansubagentswarmcapabilitymachinery →

Conceptual — the shape of a documented principle (capability scales sub-linearly with machinery), not a fabius measurement.

07The decision

Every call is the same inequality.

Add a tool? Spawn a reviewer? Branch wider? Retry once more? Under the hood fabius asks one question — does the expected loss removed beat the cost of the machinery? That single threshold governs all eighteen routing rules (each adversarially verified, the set proven to cohere). Scout wide, strike narrow — as arithmetic.

𝔼[L | skip] 𝔼[L | act] > c → engage
𝔼[L | skip]expected loss without it𝔼[L | act]expected loss with itcthe cost of the machinery
Act only when the value clears the cost.
cost cskip — pure overheadengagevalue of information →net value of acting

The gap on the left is the value of information. Below the threshold the machinery is pure overhead; fabius stays inline. (rules R3 · R7 · M1 · M4 — one object, different machinery.)

Worked example — should fabius add a reviewer?

  1. The author missesp₁ = 30%
  2. An independent reviewer missesp₂ = 40%
  3. Independent ⇒ both missp₁·p₂ = 0.30×0.40 = 0.12
  4. Expected error30% → 12%
one pass30%
two passes12%

18 points of loss removed — more than one agent costs. So fabius spawns the reviewer: two decorrelated passes beat one. (rule M1)

08Every model

One agent. Every major model.

fabius runs on every major provider from one console — and picks a tier per task: frontier when the job is hard, mid or fast when it isn’t. Bring your own key; the agent adapts to whatever you hand it.

  • AnthropicClaude Opus 4.8 · Sonnet 5 · Haiku 4.5
  • OpenAIGPT-5 · GPT-5 mini · GPT-5 nano
  • GoogleGemini 2.5 Pro · Flash · Flash-Lite
  • MistralLarge · Medium · Small
  • GroqLlama 3.3 70B · 3.1 8B — instant
  • Any modelbring a key, the agent adapts

One console. Every model.

Run and manage fabius from the synapse console — hand it a task, pick a model tier, and watch the loop: Scout → Plan → Strike → Prove → Record. Every run is logged, costed, and reviewable.

1 open the console — hand fabius a task
2 it routes, runs on your chosen model, self-verifies, records

Private, cryptographically sealed brain · runs on every major model · every run logged and reviewable.

09Questions

Frequently asked.

What is fabius?

An autonomous engineering agent. One stance — scout wide, strike narrow — across the whole job: code, design, agents, debugging, memory, marketing, defensive security, game craft, on-chain development and sealing, automation, science, ML engineering, markets & finance, and convening a cross-model council to deliberate one answer. Fifteen coordinated, non-overlapping skills form its brain, and it runs on every major model.

How do I run fabius?

You run and manage fabius from the synapse console: hand it a task, pick a model tier, and it routes → plans → strikes with the smallest correct artifact → self-verifies → records. Every run is logged and reviewable.

Which models does it run on?

Every major provider — Anthropic, OpenAI, Google, Mistral and Groq — at a frontier, mid or fast tier per task. Bring your own key and the agent adapts. Its benchmark covers the four newest Claude models — blind-judged and objectively verified, the code executed and the facts checked — with external-model demos on GPT, Grok and Mistral.

Is the brain public?

No. fabius’s brain — the fifteen skills, the routing policy, the memory — is private and cryptographically sealed (SHA-256 + a Merkle root anchored to Bitcoin). The seal is proof of authorship if anyone ever copies it.

Does it make the model smarter?

No — it is a scope-control system, not a capability boost. In its benchmark — blind, judged against both a bare baseline and a generic “be concise” control — every capable tier (Fable 5, Sonnet 5, Opus 4.8) beats both controls on the four newest Claude models. And it holds when verified objectively — code run against hidden tests, facts checked: under fabius the smallest model passes 93% of real tests and factual checks vs 75.6% bare, and parameterized SQL goes 67.5% → 100%. All of it on 20–35% less output. Honest framing: structure beats brevity — not smarter, not 10× on everything; every capable tier gains on both panels and the fast tier dips on trivial one-liners, printed as-is.

How is it different from telling the model to “be concise”?

That is exactly the control it was tested against — and it wins. fabius is structure, not brevity: a scope-control system that knows when to compress and when to expand. Method and caveats are in the whitepaper.