Routing
Every request goes through the router. For an explicit model the
router’s job is small (pick a healthy upstream that serves it); for
lowrouter/auto the router picks the model too. This page describes
both.
The router’s inputs
For each request:
- The model string in the request (
lowrouter/auto, an explicit ID, or one of the auto-* pseudo-models). - The
routeobject, if present (provider,region,prefer_low_carbon,fallback). - The key’s policy: any of the per-key fields from API key management.
- The current health of every upstream (
ok,degraded,unavailable). - The current grid carbon intensity for each (provider, region) pair.
Decision order
The router applies constraints from the most specific to the least:
- Per-request
route. A pinnedproviderorregionremoves anything that doesn’t match. - Per-key policy. A key’s
regionpin ormodelsallowlist is then applied. - Account policy. Defaults set in auto-routing settings — for instance, “prefer EU regions when possible.”
- Auto-router scoring. Whatever survives the above is scored on:
- Capability match (does the model support the request shape — vision input, tool use, structured output?).
- Provider health.
- Latency (median over the last 5 minutes per upstream).
- Carbon (grams per 1K tokens for that provider × region pair, with
the bias controlled by
prefer_low_carbon). - Price (matters when the request used
auto-cheap).
- Tie-break. When two candidates score within 1% of each other,
the more recently used one wins (sticky routing within a session
when
useris supplied; otherwise random).
What happens on failure
If the chosen upstream returns a 5xx or times out:
- The router marks that upstream’s slot temporarily unavailable (decaying over a few minutes).
- It tries the next eligible candidate in the same region (region is never violated silently).
- If none, it tries other regions only if
route.fallback != falseand the per-key/account policy allows it. - If still none, it returns
503 service_unavailablewith a code describing what’s missing.
The full chain of attempts is recorded in the generation’s
routing_trace, visible on the dashboard’s transaction-detail page.
prefer_low_carbon
The auto-router’s carbon score is a weighted term in its overall
score. Setting prefer_low_carbon: true on a request increases that
weight, which pushes traffic toward providers serving from
lower-grid-intensity regions when capability and latency are
comparable.
It does not override pinned regions or providers. It does not guarantee the lowest-carbon option in absolute terms — only that, all else equal, lower carbon wins.
Worked example
A request for lowrouter/auto with a vision input:
- Drop models that don’t support vision.
- Drop providers in
degradedorunavailablestate. - Among the rest, score on (capability fit, latency, carbon).
- The top-scored option wins.
If the top-scored option later returns 502 mid-request:
- Mark its
(provider, region)slot unavailable. - Re-score the surviving candidates.
- Retry with the new top option (capped at two retries per request).
- If retries are exhausted, return 502 to the caller.
Pinning recipes
| Goal | Recipe |
|---|---|
| EU residency | Set route.region: eu-west per request, or pin the key’s region. |
| Specific provider | route.provider: anthropic. Combine with route.region for region too. |
| Hard pin (no failover) | route.provider, route.region, route.fallback: false. |
| Lower carbon | route.prefer_low_carbon: true. Combine with no region pin to let the router pick the cleanest available region. |
| Cheapest acceptable | Use lowrouter/auto-cheap. |
What the router does not do
- It does not benchmark output quality. The auto router optimises for capability, latency, carbon, and price — not “is the answer good”.
- It does not silently swap models mid-conversation. If you’ve been
routed to model A on the first turn, the auto router prefers
sticking with model A on the second when you supply a stable
user.