The Smart Router

The Smart Router — three neural pathways branching from a central node, each leading to a different brain.

The Smart Router

Date: 2026-04-07 Status: Production (sanctum-server v0.2.0)

The Jedi Council has seven agents. Three model tiers. One port. And for longer than anyone wants to admit, every request went to the same backend regardless of what it was asking for.

This meant Yoda — wise, measured, trained on Council doctrine — was fielding questions about Python async patterns. And Coder-14B, a model that has never heard of the Living Force and does not care, was being asked to channel Qui-Gon’s infrastructure wisdom. The results were exactly as useful as you’d expect. Sending a coding question to a model trained exclusively on Jedi personas produces responses that are spiritually rich and syntactically devastating.

The Smart Router extends sanctum-server (Rust/Axum) with multi-backend dispatch. One port, multiple brains. Requests are routed based on the model field, glob patterns, or intent classification from the prompt content. The right question goes to the right brain, and the brains stop having identity crises.

Architecture

The Smart Router — requests flow through a central dispatch node to three specialized backends: Council (teal), Coder (amber), Cloud (white).

Routing Tiers

Four tiers, in order of precedence. The router tries the cheapest match first and escalates only when it has to think.

Tier	Mechanism	Example	Latency
1	Exact model name	`"model": "coder"` → Coder backend	0ms
2	Glob pattern	`"model": "code-review"` matches `code-*` → Coder	0ms
3	Intent keywords	`model` omitted or `"auto"`, prompt has “debug” + “python” → Coder	~1ms
4	Default	No match → Council (configurable)	0ms

Tier 1 and 2 are pure string matching — zero overhead. Tier 3 scans the prompt for keywords, which sounds expensive until you realize it’s checking a dozen strings against a single message. The ~1ms is generous. One gotcha: Tier 3 only fires when the request omits model or sends model: "auto". An explicit name that matches nothing — a typo, a retired alias — does not fall through to intent classification; it drops straight to Tier 4. The router trusts you to mean what you typed. Tier 4 is the safety net: if nothing matches, send it to Council and let the persona model do what it was trained for.

Configuration

Router config lives in ~/.sanctum/instance.yaml under the router: key. Current (2026-04-23) topology after the Olympics-informed rework:

router:
  backends:
    council-secure:
      url: https://127.0.0.1:1337/v1         # mTLS-only
      models: [yoda, mothma, windu, cilghal, mundi, jocasta, quigon,
               council-secure, council-ops, council]
      description: "Qwen3.6-35B-A3B - haus default (0.957 on Olympics)"
      fallback_urls: [https://100.0.0.55:8903/v1, http://127.0.0.1:8901/v1]
      ca_cert_path:  ~/.sanctum/certs/ca.crt
      client_cert_path: ~/.sanctum/certs/clients/sanctum-server.crt
      client_key_path:  ~/.sanctum/certs/clients/sanctum-server.key
    coder:
      url: http://127.0.0.1:3301/v1          # sanctum-mlx-codestral, native MLX seat
      models: [coder, code-*, "*coder*", ahsoka]
      description: "Codestral-22B-v0.1-4bit - pure code + Ahsoka (chalet 16 GB)"
    cloud:
      url: http://100.0.0.55:3456/v1         # Claude Max bridge (separate Mini)
      models: [opus, opus-4.7, claude-*, cloud, escalation]
      description: "Claude Opus 4.7 via Max subscription - Yoda/Mundi escalation (0.887)"
      fallback_urls: [https://openrouter.ai/api/v1]
      api_key_env: OPENROUTER_API_KEY
    spatial:
      url: https://generativelanguage.googleapis.com/v1beta/openai/v1
      models: [gemini, gemini-3.1-pro, windu-spatial, spatial]
      description: "Gemini 3.1 Pro via Google AI Studio Ultra - Windu spatial/topology"
      api_key_env: GOOGLE_AI_API_KEY
  default: council-secure

Two things in that block earn a second look. The cloud backend’s primary url is the Claude Max bridge — a small proxy on :3456 that fronts the Max subscription — with metered OpenRouter as the fallback, not the other way around. And quigon lives under council-secure, not coder: the quigon model alias resolves to the local 35B. Qui-Gon-the-agent still gets Codestral, but he gets there through the agent layer (council-code), not by anyone typing model: "quigon". More on that two-layer split below.

Jedi → backend assignments

This table reflects the post-2026-04-28 routing, when the Council went heterogeneous. Every Jedi now starts on a different model family by default and falls back to the local Qwen3.6 if the primary path fails. The Primary column is the agent-layer assignment — the model each Jedi gets when you talk to them through openclaw agent — which is the routing that matters day to day. (The Smart Router’s model-alias resolution differs for a couple of seats; see the two-layer note below.) See (Neuro)diversity is Paramount for why we did this and what it bought us.

Jedi	Primary	Fallback	Why
Yoda	cloud — Claude Opus 4.7 (Max sub via `<MINI>:3456`)	council-secure	Synthesis, council coordination, novel reasoning; the seat where one wrong call is expensive
Mundi	cloud — Claude Opus 4.7 (Max sub)	council-secure	Fund decisions, tax / FBAR edge cases, financial reasoning
Qui-Gon	coder — Codestral-22B-v0.1-4bit (`sanctum-mlx-codestral` native MLX `<MINI>:3301`, via the agent-layer `council-code` alias)	council-secure	Dense code gen, infrastructure pragmatism. (Was Qwen2.5-Coder-14B on `:1338` until 2026-06-07; that seat — and the LM Studio `:1234` socat bridge — are now retired, weights deleted, and every model is served by a native MLX seat.)
Windu	spatial — Gemini 3.1 Pro (Google AI Studio Ultra)	council-secure	Spatial reasoning is Gemini’s strongest discipline — perimeter, network topology, zone maps
Cilghal	council-secure — Qwen3.6-35B-A3B (sanctum-mlx, mTLS-only)	(no remote fallback)	Health data is privacy-critical; stays in the haus, always
Mothma	council-secure — Qwen3.6-35B-A3B	council-secure	SystemdLog 0.94, LaunchAgent 0.88 on Olympics — local-only by mandate
Jocasta	council-secure — Qwen3.6-35B-A3B	council-secure	Email / CRM; privacy-first
Ahsoka	coder — Codestral-22B-v0.1-4bit (when reachable on the haus `:3301` seat)	—	Runs on a Mac Mini M1 16 GB at the chalet; 35B won’t fit, and Codestral-22B won’t fit there either — the chalet’s own local coder is an open decision (last config: Qwen2.5-Coder-14B, offline)

Cloud Opus runs through the Claude Max subscription bridge at <MINI>:3456, not the metered Anthropic API. Routine Council questions consume zero API credits. Bring-out-the-good-china reasoning happens for the price of the monthly subscription, not per-token.

The Olympics (2026-04-23) informed this lineup: Qwen3.6-35B-A3B at 0.957 is still the haus default for any seat without a domain-specific upgrade — Mothma/Jocasta/Cilghal stay there because their work is privacy-critical or local-by-mandate, and Coder-14B (0.929) stays on the bench for coding-specific work and the chalet where RAM is the constraint. Opus 4.7 and Gemini 3.1 Pro now sit at the head of two seats each, not as escalations. See Model Comparison for the full per-task score breakdown and (Neuro)diversity is Paramount for the doctrine that made this routing the default rather than the escalation.

Two routing layers, one Council

There are now two routers in front of every Council request, and they answer different questions:

Layer	Lives at	Routes by	Caller
openclaw-gateway agents.list	`<VM>:1977` then `<MINI>:4040` (proxyd)	Agent identity → primary model	`openclaw agent --agent <id>` calls; the daily Council interface
sanctum-server Smart Router (this page)	`<MINI>:8900`	Model name + glob + intent classification	Direct API callers (eval harness, integrations, anyone hitting `/v1/chat/completions` with an explicit `model:`)

The agent layer chooses the model from the agent’s role. The Smart Router chooses the backend from the model name. They compose: an agent call resolves to a model alias on the agent side, then the Smart Router (or proxyd, the per-tier router) maps that alias to a backend. The flow openclaw agent --agent main → council-max-thinking → :3456 (Claude Max bridge) → Opus 4.7 is the canonical path for Yoda after 2026-04-28.

This is exactly why Qui-Gon reads two ways depending on the door you knock on. openclaw agent --agent quigon resolves to council-code and lands on Codestral at :3301 — the agent layer knows Qui-Gon writes code. But hit the Smart Router with model: "quigon" and you get the local 35B on council-secure, because there “quigon” is just a persona alias, and personas live with the Council. Same name, two layers, two brains — and knowing which door you used is half of debugging a response that doesn’t sound right.

To verify the agent-layer routing is healthy:

~/Documents/Claude_Code/tools/test-council-routing.sh

The script invokes each Jedi, parses the embedded execution trace’s winnerModel field, and asserts each one reaches their assigned model. Five passes means the Council is heterogeneous and behaving. Run it ~weekly; cloud paths fail silently sometimes (rate limit hits, billing flips, preview models get deprecated) and the script catches drift before the morning briefing notices.

The models arrays are glob patterns. The intent_keywords are plain strings matched against the last user message. If you’re wondering whether regex would be more powerful here — yes, and also no. Keyword matching is predictable, debuggable, and has never once woken anyone up at 3 AM with a catastrophic backtrack.

Endpoints

Endpoint	Description
`POST /v1/chat/completions`	Routed chat — selects backend automatically
`GET /v1/models`	Lists all registered backends
`GET /v1/backends`	Backend names + patterns
`GET /health`	Reports `"mode": "routed"` or `"single"`

Implementation

The router is a new Rust module in sanctum-server/src/router.rs. The HttpProxyBackend in backend.rs handles the actual proxying to any OpenAI-compatible endpoint, with automatic detection of streaming (SSE) vs non-streaming (JSON) responses.

Non-quantized and quantized backends both work. The proxy handles think-tag stripping (strip_thinking flag) regardless of which backend processes the request. The router doesn’t care what the backend is running — sanctum-mlx, the legacy socat plain bridge, a cloud API behind a cost-capped proxy — as long as it speaks OpenAI-compatible HTTP. This is the correct amount of opinion for a routing layer to have.

Tests

48 tests total: 43 unit (pattern matching, think-tag stripping) + 5 e2e covering health and discovery, explicit-model routing, pattern routing, intent classification, and default fallback.

cd sanctum-server && cargo test

The e2e tests spin up 3 mock Axum servers, launch the actual sanctum-server binary with a temp config, and verify requests land on the correct backend. Three fake brains, one real router, zero ambiguity about who answered.

The router exists because “send everything to the same model” is a strategy in the same way that “forward every email to the CEO” is a strategy. It works until it doesn’t, and then it fails in ways that are obvious to everyone except the system that’s doing it.