Service Troubleshooting

Service-level failures that don’t have a one-line fix. Each section earned its place through a real outage that hid behind a cheerful green probe — the kind of bug that smiles at the watchdog while quietly dropping every fifth request for an hour. The main Troubleshooting page keeps the universal infrastructure scenarios; this annex carries the ones a few clicks down the diagnostic tree, where the fault lives in the gap between two services that have technically never been formally introduced.
VM Can Reach OpenClaw But Not Local Models
Section titled “VM Can Reach OpenClaw But Not Local Models”Symptom: The VM can talk to the Mac gateway on 10.10.10.1:1977, but model calls to 10.10.10.1:1337 or 10.10.10.1:1234 fail.
That means the VM bridge exists, but the model-serving side of the Mac has drifted. Usually one of two things is true:
- The MLX model server is down or bound incorrectly.
- The LM Studio bridge listener on
10.10.10.1:1234is gone.
Check from the VM:
ssh openclaw "curl -fsS http://10.10.10.1:1337/v1/models | jq '.data | length'"ssh openclaw "curl -fsS http://10.10.10.1:1234/v1/models | jq '.data | length'"Check on the Mac:
curl -fsS http://127.0.0.1:1337/v1/modelscurl -fsS http://127.0.0.1:1234/v1/modelsFix:
# Re-run the VM startup path to restore the bridge surfacesbash ~/.openclaw/scripts/vm-autostart.shIf 127.0.0.1:1337 is down too, the MLX server itself is the problem, not the bridge. Fix the model server first, then re-run vm-autostart so the VM-side path matches reality again.
MLX Server Returning 503 (sanctum-idle Port Conflict)
Section titled “MLX Server Returning 503 (sanctum-idle Port Conflict)”Symptom: Local model requests fail with 502 All models failed. Last error: 503 http://10.10.10.1:1337/v1/chat/completions. The MLX server process is running but returning empty 503 responses.
Root cause: Two LaunchAgents competing for the same model server:
com.sanctum.council-mlxstarts the MLX server directly on0.0.0.0:1337com.sanctum.idle-mlxrunssanctum-idle, which listens on10.10.10.1:1337and expects to manage the MLX server on127.0.0.1:8900
When both are active, council-mlx binds MLX directly to port 1337. The sanctum-idle proxy still accepts connections on 10.10.10.1:1337 but its backend on port 8900 is empty — nothing is listening there. Every request gets a 503.
Diagnosis:
# Check for the conflict — two processes on port 1337 is the giveawaylsof -i :1337# If you see BOTH a Python/MLX process AND a sanctum-idle process, that's the bug
# Confirm nothing on 8900 (where idle expects the backend)lsof -i :8900# Empty = confirmed conflictFix:
# 1. Unload the conflicting agent (stops the process AND removes from launchd)launchctl bootout gui/$(id -u)/com.sanctum.council-mlx
# 2. Permanently disable it so it never loads again at bootlaunchctl disable gui/$(id -u)/com.sanctum.council-mlx
# 3. Kill any orphaned MLX process still on port 1337kill $(pgrep -f 'mlx_lm.server.*--port 1337') 2>/dev/null
# 4. Restart idle-mlx so it manages the lifecycle properlylaunchctl kickstart -k gui/$(id -u)/com.sanctum.idle-mlxVerify:
# Should show only sanctum-idle on 1337lsof -i :1337
# Send a test request — idle will wake the model (may take ~30s first time)curl -s http://10.10.10.1:1337/v1/modelsSanctum Proxy Missing API Keys After Manual Restart
Section titled “Sanctum Proxy Missing API Keys After Manual Restart”Symptom: Claude Code requests fall through to deepseek-v3 or other fallback models instead of reaching Anthropic. The proxy is running but all Anthropic requests fail silently.
Root cause: The proxy binary was started directly (./target/release/sanctum-proxy) instead of through the LaunchAgent. The launcher script (~/.sanctum/scripts/proxy-launcher.sh) injects API keys from macOS Keychain. Without it, ANTHROPIC_API_KEY, OPENROUTER_API_KEY, and GEMINI_API_KEY are all empty.
Diagnosis:
# Check the launcher log — look for "anthropic=yes"tail -5 ~/.openclaw/logs/sanctum-proxy-launcher.log
# If the last entry doesn't show key loading, the proxy was started manuallyFix:
# Always restart through the LaunchAgent, never the binary directlylaunchctl kickstart -k gui/$(id -u)/com.sanctum.proxyFallback Chain Dead End (council-heartbeat)
Section titled “Fallback Chain Dead End (council-heartbeat)”Symptom: Heartbeat or briefing requests fail with 502 even though remote providers are healthy.
Root cause: The model’s fallback chain only contains other local models on the same server. If that server is down, every fallback also fails.
Example: council-heartbeat originally had only council-mlx as a fallback — both pointing at http://10.10.10.1:1337. When the MLX server was down, there was no escape route to a remote provider.
Fix: Ensure every local model has at least one remote fallback in config.yaml:
fallbacks: council-heartbeat: - council-mlx # same local server (fast path) - nemotron-free # remote escape route (OpenRouter)Holocron App Starts Black
Section titled “Holocron App Starts Black”Symptom: /Applications/The Holocron.app launches, the window appears, and then all you get is a black rectangle contemplating its choices.
The browser dashboard can still be healthy while the packaged Electron shell is busy sabotaging itself. They are related, not identical.
Check:
ps -ef | rg '/Applications/The Holocron.app/Contents/MacOS/The Holocron'tail -50 ~/.openclaw/logs/living-force.logtail -50 ~/Library/Application\ Support/the-holocron/logs/main.log 2>/dev/nullCommon root cause: run_sanctum.sh used an overly broad process cleanup rule:
pkill -f "the-holocron"That matched Electron helper processes via the user-data path and killed the renderer just after launch. Technically precise. Spiritually deranged.
Fix:
# Only target Holocron dev/Vite processes, never the packaged apppkill -f '/Users/neo/Projects/the-holocron/.*vite' || truepkill -f 'vite --host 127.0.0.1 --port 3333' || true
# Reinstall the current tested app bundlecd /Users/neo/Projects/the-holocronnpm run update:app