Skip to content

The Smart Router

The Smart Router — three neural pathways branching from a central node, each leading to a different brain.

Date: 2026-04-07 Status: Production (sanctum-server v0.2.0)

The Jedi Council has seven agents. Three model tiers. One port. And for longer than anyone wants to admit, every request went to the same backend regardless of what it was asking for.

This meant Yoda — wise, measured, trained on Council doctrine — was fielding questions about Python async patterns. And Coder-14B, a model that has never heard of the Living Force and does not care, was being asked to channel Qui-Gon’s infrastructure wisdom. The results were exactly as useful as you’d expect. Sending a coding question to a model trained exclusively on Jedi personas produces responses that are spiritually rich and syntactically devastating.

The Smart Router extends sanctum-server (Rust/Axum) with multi-backend dispatch. One port, multiple brains. Requests are routed based on the model field, glob patterns, or intent classification from the prompt content. The right question goes to the right brain, and the brains stop having identity crises.

The Smart Router — requests flow through a central dispatch node to three specialized backends: Council (teal), Coder (amber), Cloud (white).

Four tiers, in order of precedence. The router tries the cheapest match first and escalates only when it has to think.

TierMechanismExampleLatency
1Exact model name"model": "coder" → Coder backend0ms
2Glob pattern"model": "code-review" matches code-* → Coder0ms
3Intent keywordsmodel omitted or "auto", prompt has “debug” + “python” → Coder~1ms
4DefaultNo match → Council (configurable)0ms

Tier 1 and 2 are pure string matching — zero overhead. Tier 3 scans the prompt for keywords, which sounds expensive until you realize it’s checking a dozen strings against a single message. The ~1ms is generous. One gotcha: Tier 3 only fires when the request omits model or sends model: "auto". An explicit name that matches nothing — a typo, a retired alias — does not fall through to intent classification; it drops straight to Tier 4. The router trusts you to mean what you typed. Tier 4 is the safety net: if nothing matches, send it to Council and let the persona model do what it was trained for.

Router config lives in ~/.sanctum/instance.yaml under the router: key. Current (2026-04-23) topology after the Olympics-informed rework:

router:
backends:
council-secure:
url: https://127.0.0.1:1337/v1 # mTLS-only
models: [yoda, mothma, windu, cilghal, mundi, jocasta, quigon,
council-secure, council-ops, council]
description: "Qwen3.6-35B-A3B - haus default (0.957 on Olympics)"
fallback_urls: [https://100.0.0.55:8903/v1, http://127.0.0.1:8901/v1]
ca_cert_path: ~/.sanctum/certs/ca.crt
client_cert_path: ~/.sanctum/certs/clients/sanctum-server.crt
client_key_path: ~/.sanctum/certs/clients/sanctum-server.key
coder:
url: http://127.0.0.1:3301/v1 # sanctum-mlx-codestral, native MLX seat
models: [coder, code-*, "*coder*", ahsoka]
description: "Codestral-22B-v0.1-4bit - pure code + Ahsoka (chalet 16 GB)"
cloud:
url: http://100.0.0.55:3456/v1 # Claude Max bridge (separate Mini)
models: [opus, opus-4.7, claude-*, cloud, escalation]
description: "Claude Opus 4.7 via Max subscription - Yoda/Mundi escalation (0.887)"
fallback_urls: [https://openrouter.ai/api/v1]
api_key_env: OPENROUTER_API_KEY
spatial:
url: https://generativelanguage.googleapis.com/v1beta/openai/v1
models: [gemini, gemini-3.1-pro, windu-spatial, spatial]
description: "Gemini 3.1 Pro via Google AI Studio Ultra - Windu spatial/topology"
api_key_env: GOOGLE_AI_API_KEY
default: council-secure

Two things in that block earn a second look. The cloud backend’s primary url is the Claude Max bridge — a small proxy on :3456 that fronts the Max subscription — with metered OpenRouter as the fallback, not the other way around. And quigon lives under council-secure, not coder: the quigon model alias resolves to the local 35B. Qui-Gon-the-agent still gets Codestral, but he gets there through the agent layer (council-code), not by anyone typing model: "quigon". More on that two-layer split below.

This table reflects the post-2026-04-28 routing, when the Council went heterogeneous. Every Jedi now starts on a different model family by default and falls back to the local Qwen3.6 if the primary path fails. The Primary column is the agent-layer assignment — the model each Jedi gets when you talk to them through openclaw agent — which is the routing that matters day to day. (The Smart Router’s model-alias resolution differs for a couple of seats; see the two-layer note below.) See (Neuro)diversity is Paramount for why we did this and what it bought us.

JediPrimaryFallbackWhy
Yodacloud — Claude Opus 4.7 (Max sub via <MINI>:3456)council-secureSynthesis, council coordination, novel reasoning; the seat where one wrong call is expensive
Mundicloud — Claude Opus 4.7 (Max sub)council-secureFund decisions, tax / FBAR edge cases, financial reasoning
Qui-Goncoder — Codestral-22B-v0.1-4bit (sanctum-mlx-codestral native MLX <MINI>:3301, via the agent-layer council-code alias)council-secureDense code gen, infrastructure pragmatism. (Was Qwen2.5-Coder-14B on :1338 until 2026-06-07; that seat — and the LM Studio :1234 socat bridge — are now retired, weights deleted, and every model is served by a native MLX seat.)
Winduspatial — Gemini 3.1 Pro (Google AI Studio Ultra)council-secureSpatial reasoning is Gemini’s strongest discipline — perimeter, network topology, zone maps
Cilghalcouncil-secure — Qwen3.6-35B-A3B (sanctum-mlx, mTLS-only)(no remote fallback)Health data is privacy-critical; stays in the haus, always
Mothmacouncil-secure — Qwen3.6-35B-A3Bcouncil-secureSystemdLog 0.94, LaunchAgent 0.88 on Olympics — local-only by mandate
Jocastacouncil-secure — Qwen3.6-35B-A3Bcouncil-secureEmail / CRM; privacy-first
Ahsokacoder — Codestral-22B-v0.1-4bit (when reachable on the haus :3301 seat)Runs on a Mac Mini M1 16 GB at the chalet; 35B won’t fit, and Codestral-22B won’t fit there either — the chalet’s own local coder is an open decision (last config: Qwen2.5-Coder-14B, offline)

Cloud Opus runs through the Claude Max subscription bridge at <MINI>:3456, not the metered Anthropic API. Routine Council questions consume zero API credits. Bring-out-the-good-china reasoning happens for the price of the monthly subscription, not per-token.

The Olympics (2026-04-23) informed this lineup: Qwen3.6-35B-A3B at 0.957 is still the haus default for any seat without a domain-specific upgrade — Mothma/Jocasta/Cilghal stay there because their work is privacy-critical or local-by-mandate, and Coder-14B (0.929) stays on the bench for coding-specific work and the chalet where RAM is the constraint. Opus 4.7 and Gemini 3.1 Pro now sit at the head of two seats each, not as escalations. See Model Comparison for the full per-task score breakdown and (Neuro)diversity is Paramount for the doctrine that made this routing the default rather than the escalation.

There are now two routers in front of every Council request, and they answer different questions:

LayerLives atRoutes byCaller
openclaw-gateway agents.list<VM>:1977 then <MINI>:4040 (proxyd)Agent identity → primary modelopenclaw agent --agent <id> calls; the daily Council interface
sanctum-server Smart Router (this page)<MINI>:8900Model name + glob + intent classificationDirect API callers (eval harness, integrations, anyone hitting /v1/chat/completions with an explicit model:)

The agent layer chooses the model from the agent’s role. The Smart Router chooses the backend from the model name. They compose: an agent call resolves to a model alias on the agent side, then the Smart Router (or proxyd, the per-tier router) maps that alias to a backend. The flow openclaw agent --agent main → council-max-thinking → :3456 (Claude Max bridge) → Opus 4.7 is the canonical path for Yoda after 2026-04-28.

This is exactly why Qui-Gon reads two ways depending on the door you knock on. openclaw agent --agent quigon resolves to council-code and lands on Codestral at :3301 — the agent layer knows Qui-Gon writes code. But hit the Smart Router with model: "quigon" and you get the local 35B on council-secure, because there “quigon” is just a persona alias, and personas live with the Council. Same name, two layers, two brains — and knowing which door you used is half of debugging a response that doesn’t sound right.

To verify the agent-layer routing is healthy:

Terminal window
~/Documents/Claude_Code/tools/test-council-routing.sh

The script invokes each Jedi, parses the embedded execution trace’s winnerModel field, and asserts each one reaches their assigned model. Five passes means the Council is heterogeneous and behaving. Run it ~weekly; cloud paths fail silently sometimes (rate limit hits, billing flips, preview models get deprecated) and the script catches drift before the morning briefing notices.

The models arrays are glob patterns. The intent_keywords are plain strings matched against the last user message. If you’re wondering whether regex would be more powerful here — yes, and also no. Keyword matching is predictable, debuggable, and has never once woken anyone up at 3 AM with a catastrophic backtrack.

EndpointDescription
POST /v1/chat/completionsRouted chat — selects backend automatically
GET /v1/modelsLists all registered backends
GET /v1/backendsBackend names + patterns
GET /healthReports "mode": "routed" or "single"

The router is a new Rust module in sanctum-server/src/router.rs. The HttpProxyBackend in backend.rs handles the actual proxying to any OpenAI-compatible endpoint, with automatic detection of streaming (SSE) vs non-streaming (JSON) responses.

Non-quantized and quantized backends both work. The proxy handles think-tag stripping (strip_thinking flag) regardless of which backend processes the request. The router doesn’t care what the backend is running — sanctum-mlx, the legacy socat plain bridge, a cloud API behind a cost-capped proxy — as long as it speaks OpenAI-compatible HTTP. This is the correct amount of opinion for a routing layer to have.

48 tests total: 43 unit (pattern matching, think-tag stripping) + 5 e2e covering health and discovery, explicit-model routing, pattern routing, intent classification, and default fallback.

Terminal window
cd sanctum-server && cargo test

The e2e tests spin up 3 mock Axum servers, launch the actual sanctum-server binary with a temp config, and verify requests land on the correct backend. Three fake brains, one real router, zero ambiguity about who answered.

The router exists because “send everything to the same model” is a strategy in the same way that “forward every email to the CEO” is a strategy. It works until it doesn’t, and then it fails in ways that are obvious to everyone except the system that’s doing it.