# LowRouter — Full Documentation

This file concatenates every page of LowRouter's public documentation. Source: /docs.

---

# Philosophy


# Philosophy

LowRouter exists because running an application on top of large language
models forces choices that are usually invisible: which provider serves
the request, where their hardware sits, what the request actually costs
in energy and carbon, and what happens when one provider goes down.

These four pages set out the position we take on those choices. They
are not marketing copy. They describe what we measure, what we choose
not to measure, and why those decisions sometimes lead us to slower,
narrower, or more expensive defaults than the rest of the market.

- [Why LowRouter exists](why-lowrouter)
- [Sustainability-first](sustainability-first)
- [Sovereignty and transparency](sovereignty-and-transparency)
- [Principles in practice](principles-in-practice)


---

# Why LowRouter exists


# Why LowRouter exists

LLM inference is now a default building block. Most teams that ship
features on top of it end up writing the same gateway twice: once to
abstract the provider, again to track usage and bills. That gateway is
load-bearing — it sees every prompt and every response — but it is
rarely treated as a product. It is glue.

LowRouter is that gateway as a product, with two opinions baked in.

## Opinion one: the footprint of a request is part of its cost

Most billing dashboards show tokens and dollars. LowRouter also reports
the energy a request consumed and the grams of CO₂e the inference is
estimated to have produced. Both numbers are estimates — see
[methodology](../sustainable-ai/methodology) for the formula and its
limits — but having them visible changes how the request is thought
about. A request with a known carbon number is a request a developer can
actually choose differently.

We do not claim that every request is "green" or that the estimate is
exact. We claim it exists, that the formula is documented, and that the
inputs are auditable.

## Opinion two: routing should be explicit and sovereign

When you pick a model in most gateways, you pick a *brand*. The brand
hides who actually serves the tokens — which provider, which region,
which hardware tier. That hiding is convenient until something
matters: a region requires data residency, a provider has an outage, a
contract requires a specific operator.

LowRouter exposes the route. Every response says which provider served
the request and from which region, and the dashboard lets operators
choose policies (prefer-region, prefer-low-carbon, prefer-cheapest,
fixed-provider) that map to those constraints. The default is
`lowrouter/auto`; the override is always one field away.

## What LowRouter is not

It is not an inference engine. The actual work happens at OpenAI,
Anthropic, Mistral, and other providers. We forward, we measure, we
account.

It is not a benchmarking tool. The dashboard does not rank models on
quality. We expose what we can measure faithfully — usage, latency,
energy, carbon — and leave subjective judgements to you.

It is not a free service. The credits model is documented in
[credits and billing](../guides/credits-and-billing). When the costs of
running this kind of infrastructure are made invisible, the
sustainability story becomes hollow; we'd rather charge what running it
actually costs.

## Who it's for

- **Developers** who want one endpoint and one bill across multiple
  providers, plus enough metadata to debug and improve their app.
- **Operators** who need data residency, audit trails, and a clear
  picture of which provider served what.
- **Sustainability and compliance teams** who want a defensible number
  for the AI footprint of their organisation, not a marketing pledge.

If that is not you, that is fine. The dashboard and these docs are
public for a reason — read what we measure and how, and decide whether
the trade-offs fit.


---

# Sustainability-first


# Sustainability-first

"Sustainable AI" is doing a lot of marketing work right now. This page
describes what the phrase means inside LowRouter — concretely, what
we measure, what we report, and what we have decided not to claim.

## What we measure

For every inference request, we estimate two numbers:

- **Energy per output token** (Wh), derived from the model's active
  parameter count using the
  [EcoLogits methodology](https://ecologits.ai/0.4/methodology/llm_inference/).
- **Grid carbon intensity** (gCO₂e/kWh) for the region the provider
  serves the request from, sourced from the International Energy
  Agency.

The product gives us **gCO₂e per 1,000 tokens** for the request. The
exact formula and the confidence we attach to each estimate are in the
[methodology page](../sustainable-ai/methodology).

These numbers are exposed:

- On every API response, in an `eco` block.
- On the dashboard, aggregated by day, model, provider, and region.
- On the public model browser, as a comparable estimate per model.

## What we don't measure (yet)

- **Training emissions.** We report inference only. Training is a
  separate, larger, and harder-to-attribute footprint, and folding it
  into per-request numbers is misleading.
- **Hardware embodied carbon.** GPU manufacturing has a real footprint;
  we don't yet have a defensible per-token number for it.
- **Real-time grid mix.** We use annual averages by region. Live
  carbon-aware routing is a future feature, not a current one.
- **Embedding workloads.** Encoder-only models have a different compute
  profile and our formula does not yet model them well.

We list these limits because they are part of what an honest number
looks like. The sustainable-AI page repeats them next to every chart.

## Constraints that fall out of this position

A few platform decisions follow from taking energy and carbon
seriously:

- **The site itself is on a tight transfer budget.** Pages like this
  one ship under 100 KB total. The docs site renders server-side with
  no client framework. Heavy interactive widgets are not added without
  a reason.
- **The default route is auto-mode**, not "the largest model." If a
  smaller model can do the job for a fraction of the energy, we'd
  rather it be the default.
- **Providers and regions with very high grid intensity are
  deprioritised** in routing unless explicitly pinned by the caller.
  See [provider routing](../models/routing).
- **We are slower to add features than we'd like.** Every new
  background job, every dashboard widget, every fancy
  visualisation is a per-user energy cost; we add them when they earn
  the cost.

## What this is not

A pledge that any single request is green. A claim that the estimate is
exact. An assertion that running an LLM through LowRouter is
meaningfully better for the planet than running it directly.

What it is: a measured number, an open formula, and a default that
prefers the smaller model and the cleaner grid when other constraints
allow.


---

# Sovereignty and transparency


# Sovereignty and transparency

Two of the practical reasons teams move from a single LLM provider to a
gateway are sovereignty (where the request goes) and transparency
(what's actually happening). Both deserve concrete answers, not
adjectives.

## Sovereignty

LowRouter's control plane is operated from the European Union by
Carbonifer SAS. The data plane — the providers that actually run the
inference — covers multiple regions: EU (Mistral, Anthropic via EU
endpoints, Bedrock EU), US (OpenAI, Anthropic via US endpoints,
Bedrock US), and a growing set of regional Vertex AI deployments.

When you send a request without a region preference, the router picks
based on availability and the carbon-intensity heuristic described in
[provider routing](../models/routing). The chosen region is reported in
the response.

When you need data residency:

- Pin a region in the request body (`route.region`) or via a virtual
  key policy (recommended for production).
- The route fails closed: if no provider in the requested region is
  available, the request returns an error rather than silently falling
  back to another region.

We do not run inference on user prompts inside our own infrastructure
beyond what is required to forward and account for them. Logs are
documented in [usage accounting](../guides/usage-accounting).

## Transparency

Every API response includes a `provider` field with the upstream that
served it, a `region` field with the region it served from, and a
`generation_id` that can be looked up later through
[`GET /generation/{id}`](../api-reference/overview). The dashboard shows
the same data per request, per day, and aggregated per model.

The carbon and energy numbers shown on the dashboard are produced by
the formula in [methodology](../sustainable-ai/methodology). The
formula's coefficients, the model parameter counts we use, and the
grid-intensity values we apply are documented and dated. When a value
changes — for instance, when a new IEA dataset replaces an older one —
the change is reflected in the dashboard with the date it took effect.

The platform is operated by a small team. There is no army of
unaccounted-for logging or analytics services. The third-party services
involved are listed in the
[privacy policy](/privacy).

## How to verify

- Read [methodology](../sustainable-ai/methodology) and audit the
  formula.
- Send a request with `lowrouter/auto`, then re-send it pinned to a
  specific provider/region. Compare `provider`, `region`, and `eco`
  fields in the responses.
- Open the corresponding entries on the dashboard and confirm the
  numbers match.
- Use [`GET /generation/{id}`](../api-reference/overview) to fetch the
  full record after the fact. The numbers are stable.

If something doesn't reconcile, that's a bug, not a feature — please
[file an issue](https://github.com/carbonifer/lowrouter/issues).


---

# Principles in practice


# Principles in practice

Principles are easy to declare and harder to live with. This page is the
short version of how the previous three translate into product
behaviour you can observe.

## Routing

| Decision | What we do |
|----------|------------|
| Default route | `lowrouter/auto` — picks based on availability, latency, and the carbon heuristic. |
| Pinning | Any caller can pin model, provider, and region per request, or per virtual key. |
| Failover | A provider outage routes to the next eligible option in the same region. We never fail across regions silently. |
| Carbon weight | Routing is biased toward lower-carbon regions when other constraints allow. The bias is configurable and the weight is documented in [routing](../models/routing). |

## Pricing

| Decision | What we do |
|----------|------------|
| Pricing model | Pre-paid credits. The price per 1K tokens for each model is shown on the dashboard before you call it. |
| Mark-up | A flat platform fee on top of the upstream provider's price. Documented per model. |
| Free tier | None. Trying things out costs the same as production usage. |
| Refunds | A failed request that produced no upstream charge does not consume credits. |

The full pricing rules are in [credits and billing](../guides/credits-and-billing).

## Data handling

| Decision | What we do |
|----------|------------|
| Prompt logging | We log token counts, model, provider, region, and timing. We do not log prompt or response content. |
| Retention | Token-level usage is retained for 13 months for billing and auditing. Aggregates are kept longer. |
| Export | Usage history can be exported as CSV from the dashboard. |
| Subprocessors | Listed in the [privacy policy](/privacy). |

If you need a Data Processing Agreement, contact us through the channel
listed on the [legal page](/impressum).

## Operations

| Decision | What we do |
|----------|------------|
| Status page | Linked from the dashboard footer. |
| Incident communication | Public post-mortems for incidents that affected billing or routing decisions. |
| API stability | Breaking changes are versioned (`/api/v1`). Additive changes are documented in the changelog. |

## What "no" looks like

- We do not auto-upsell to larger models. The defaults aim at the
  smallest model that produces an acceptable response.
- We do not show comparative quality charts between providers. We are
  not the right venue to make those judgements; tools that benchmark
  outputs against reference suites are.
- We do not stockpile features for the sake of feature parity. New
  endpoints and dashboards are added when there is a defensible reason.


---

# Getting started


# Getting started

Four short pages that take you from a clean machine to a working
request and a populated dashboard:

1. [Create an account](account) — sign-up, email verification, and the
   identity model.
2. [Create your first API key](api-keys) — virtual keys, scoping, and
   safe storage.
3. [Run your first completion](first-completion) — a single `curl`
   call, what comes back, and the eco metadata.
4. [Tour the dashboard](dashboard-tour) — credit balance, usage, top
   models, and the eco impact widget.

If something doesn't work, the [FAQ](../faq) covers the questions we
hear most. For everything else, the support email is on the
[legal page](/impressum).


---

# Create an account


# Create an account

LowRouter accounts are individual: one email, one identity, one credit
balance. Team accounts and shared workspaces are on the roadmap; until
they ship, share access via separate API keys rather than shared
credentials.

## Sign up

Go to [/register](/register) and fill in:

- **Email** — used for verification, sign-in, and billing receipts.
- **Password** — a strong, unique one. We do not enforce a maximum
  length; we do enforce a minimum that follows current OWASP guidance.
- **Country** — used to compute the right tax treatment on invoices.
  You can correct it later from the settings page.

Alternatively, sign in with a federated identity provider listed on the
register page. Federated sign-in does not change anything about how
your data is stored — see [the privacy policy](/privacy) for the full
picture.

## Verify your email

After registration we send a verification link. The link is valid for
24 hours. Until you verify, you can sign in but you cannot create API
keys or top up credits — guarding against typos and disposable-mailbox
sign-ups.

If the email doesn't arrive, check spam, then use **Resend
verification** on the sign-in page. If it still doesn't arrive, contact
support — see the email on the [legal page](/impressum).

## Top up credits

LowRouter is pre-paid. To send a request you need a non-zero credit
balance.

1. Go to **Dashboard → Credits**.
2. Click **Add credits**.
3. Pick an amount and complete the Stripe-hosted checkout.

Credits land in your account when Stripe confirms the payment, usually
within a few seconds. The amount you top up is exclusive of VAT; the
invoice that lands in your inbox afterwards has the VAT breakdown.

The full pricing model is documented in [credits and
billing](../guides/credits-and-billing).

## What's stored

After sign-up the platform stores:

- Your email address (sign-in, billing receipts).
- A salted hash of your password (never the plaintext).
- The country and any billing details you provided.
- A unique numeric user ID used internally.

We do not store any prompt or response content. Token counts, model
IDs, provider IDs, regions, latencies, and the eco numbers are stored
per request — see [usage accounting](../guides/usage-accounting) for
the full schema.

## Next

[Create your first API key →](api-keys)


---

# Create your first API key


# Create your first API key

API keys (also called *virtual keys*) authenticate every request to the
gateway. They are bearer tokens — anyone holding the string can spend
your credits — so the rest of this page is about creating, scoping, and
rotating them safely.

## Create one

1. **Dashboard → Keys**.
2. Click **New key**.
3. Give it a name that describes where it will be used (`prod-server`,
   `local-dev`, `chatbox-personal`). The name appears in the usage
   history and helps you find the right key to rotate later.
4. (Optional) Set scoping:
   - **Models** — restrict to a list of model IDs (`openai/gpt-4o`,
     `anthropic/claude-sonnet-4-5`).
   - **Region** — pin requests through this key to a region.
   - **Daily limit** — cap spend per day to a credit amount.
5. Click **Create**. The full token shows once — copy it now.

Tokens look like `lr-sk-...` and are 40+ characters. The dashboard only
ever shows the prefix and last four characters again.

## Store it

- **Production** — in your secret manager (Vault, AWS Secrets Manager,
  GCP Secret Manager, sealed Kubernetes secret, …). Never in source
  control.
- **Local development** — in a `.env` file that is in `.gitignore`.
- **Personal tools** — in the OS keychain, or in the tool's own
  encrypted store. Avoid pasting the token into chat applications or
  notes apps that sync to the cloud.

A leaked key can be revoked from the dashboard at any time — see
*Rotate or revoke* below — but it can spend credits in the seconds
between the leak and the revocation. Treat keys like passwords.

## Use it

The header is the standard `Authorization: Bearer`:

```bash
curl https://lowrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $LOWROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lowrouter/auto",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
```

The `Authorization` header value is exactly `Bearer ` followed by the
token — no quotes, no spaces around the equals. SDKs accept the token
as the constructor's `apiKey`/`api_key` argument; see
[integrations](../integrations/openai-sdk).

## Rotate or revoke

- **Rotate** — create a second key, deploy it everywhere, then delete
  the old one. There is no built-in zero-downtime rotation; the pattern
  above gives you it without one.
- **Revoke** — **Dashboard → Keys → Delete**. The token stops working
  on the next request, no caching delay.

Rotate at least every 90 days, and immediately after any of:

- A key was committed to a repository (even briefly).
- A key was sent over an insecure channel.
- A team member with access to the key left the organisation.
- Unexpected usage shows up on the dashboard.

## Next

[Run your first completion →](first-completion)


---

# Run your first completion


# Run your first completion

This page sends one chat completion through the gateway, walks through
what came back, and points at the things you'll come back to.

## Send the request

Set your key in an environment variable so it doesn't end up in shell
history:

```bash
export LOWROUTER_API_KEY="lr-sk-..."
```

Then call the gateway:

```bash
curl https://lowrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $LOWROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lowrouter/auto",
    "messages": [
      {"role": "user", "content": "In one sentence, what is a vector database?"}
    ]
  }'
```

`lowrouter/auto` is the default route. You can pick a specific model
instead — for example `openai/gpt-4o-mini` or
`anthropic/claude-haiku-4-5` — and you can override the route per
request with the `route` field. See [routing](../models/routing).

## What comes back

The response is OpenAI-shaped. The fields you'll use most are:

```json
{
  "id": "chatcmpl-01J9...",
  "object": "chat.completion",
  "created": 1714150000,
  "model": "openai/gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A vector database stores …"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 26,
    "total_tokens": 44
  },
  "lowrouter": {
    "generation_id": "gen_01J9...",
    "provider": "openai",
    "region": "eu-west",
    "eco": {
      "energy_wh": 0.0021,
      "carbon_g": 0.00057,
      "carbon_per_1k_tokens_g": 0.013,
      "accuracy": "accurate"
    }
  }
}
```

The OpenAI-compatible parts (`id`, `choices`, `usage`, …) are
documented in [chat completions](../api-reference/chat-completions).
The LowRouter-specific block:

- **`generation_id`** — opaque ID for this request. Pass it to
  [`GET /generation/{id}`](../api-reference/overview) to fetch the full
  record later, or open it on the dashboard with that ID.
- **`provider`** — which upstream actually served the request.
- **`region`** — the region the upstream served from.
- **`eco`** — energy and carbon estimate for this request, with a
  confidence label. Read [methodology](../sustainable-ai/methodology)
  before quoting these numbers anywhere.

## Stream the response

For interactive UIs, set `"stream": true` and read Server-Sent Events:

```bash
curl https://lowrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $LOWROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "lowrouter/auto",
    "stream": true,
    "messages": [{"role": "user", "content": "Count to 5 slowly"}]
  }'
```

The stream format and how to consume it from common SDKs is on the
[streaming page](../api-reference/streaming).

## Things that might surprise you

- **The `model` in the response is the resolved model**, not the
  pseudo-model you sent. If you sent `lowrouter/auto`, the response
  tells you what was actually picked.
- **`usage` reflects upstream tokens**, which may include caching
  discounts (for providers that support them). The credits charged on
  your dashboard match this `usage`.
- **The `eco` block is missing on requests we couldn't classify**, e.g.
  a model whose parameter count is unknown. The dashboard shows the
  same record without an eco number rather than a fabricated one.

## Next

[Tour the dashboard →](dashboard-tour)


---

# Tour the dashboard


# Tour the dashboard

The dashboard is at [/dashboard](/dashboard). It's the operator view
for your account: balance, usage, eco impact, and shortcuts to the
things you'll change most often.

## The landing page

When you first land, four widgets are visible above the fold:

- **Credit balance** — the current credit balance in your account
  currency, with a button to top up. The big number is what's left;
  the small number is what's been spent in the current calendar month.
- **Usage today** — total tokens and total cost so far today, broken
  out by model. Click any model to filter the rest of the page.
- **Top models (last 30 days)** — a horizontal bar chart ranking the
  models you used most by token count.
- **Eco impact** — the energy and carbon estimate for the same window,
  with a comparison to a baseline you choose (see *Eco panel* below).

Below the widgets, the **Recent transactions** table shows the last
~50 generations: timestamp, model, provider, region, tokens, cost, and
the carbon estimate. Click any row to see the full record on its own
page.

## Drill into a transaction

The transaction detail page shows the full request record:

- The generation ID (you can copy it).
- Resolved model, provider, region, latency.
- Token counts (prompt, completion, total).
- Eco numbers with the methodology version that produced them.
- Routing trace: which providers were considered, which one was
  picked, and why (the cheapest, the lowest-carbon, the closest, …).

Prompt and response *content* are not shown — we don't store it.

## Eco panel

The eco impact widget compares the last 30 days of your usage against a
**baseline**. The baseline is a hypothetical: "what would this same
usage have looked like if I'd used model X instead?" It's a
back-of-envelope check, not a guarantee. The number it shows is
honest about uncertainty:

- The value is computed from the same energy formula and grid
  intensities as every other carbon number on the platform.
- The "you saved" framing only appears when the baseline you picked
  has a higher per-token energy estimate than your actual usage. If
  your usage was higher-energy than your baseline, the panel says so.
- When confidence is low (small samples, models whose parameters are
  estimated rather than verified), the number is shown with a
  reduced-confidence indicator and the underlying caveats are linked
  inline.

Pick or change the baseline from **Settings → Eco baseline**.

## Where everything else is

- **Keys** — create, scope, rotate, revoke. See
  [api-keys management](../guides/api-keys).
- **Credits** — top up, view receipts. See
  [credits and billing](../guides/credits-and-billing).
- **Invoices** — billing history; same data as the email receipts but
  downloadable as PDF.
- **Auto-routing** — set defaults for `lowrouter/auto`: prefer-region,
  prefer-low-carbon, fixed-provider, and per-key overrides. See
  [routing](../models/routing).
- **Settings** — profile, eco baseline, notification preferences,
  account deletion.

## Mobile

The dashboard is usable on a phone — header collapses to a hamburger
menu, charts switch to a single-column layout. It's not yet built for
heavy operations on mobile (large CSV exports, multi-key bulk edits);
those flows are still desktop-first.

## Next

You're set up. The [User guides](../guides/api-keys) section goes
deeper on the day-to-day operations. The
[API reference](../api-reference/overview) covers everything you can
do over HTTP.


---

# Guides


# Guides

The pages in this section cover the operations you'll come back to
after onboarding:

- [API key management](api-keys) — create, scope, rotate, revoke, and
  the policies that virtual keys can carry.
- [Credits and billing](credits-and-billing) — how credits work, how
  pricing is structured, and how invoices are produced.
- [Usage accounting](usage-accounting) — the transaction history,
  exports, and what's recorded per request.
- [Dashboard deep-dive](dashboard-deep-dive) — every chart on the
  dashboard, how to read it, and how to filter it.

These build on [Getting started](../getting-started/account); pages
there cover the first-time setup, pages here cover the ongoing
operation.


---

# API key management


# API key management

[Getting started → API keys](../getting-started/api-keys) walks through
creating a key. This page is the operator reference: every option a
virtual key can carry, when to use it, and how to retire keys safely.

## Anatomy of a virtual key

A virtual key is a token with metadata attached. The token authenticates
the request; the metadata controls what that request is allowed to do.

The metadata you can attach:

| Field | Purpose |
|-------|---------|
| `name` | Human label, shown in usage history. |
| `models` | Allowlist of model IDs. Empty = all models. |
| `region` | Pin requests through this key to a region (e.g. `eu-west`). |
| `daily_credit_limit` | Max credits this key can spend per UTC day. |
| `monthly_credit_limit` | Max credits per UTC calendar month. |
| `expires_at` | Optional auto-expiry timestamp. |
| `prefer_low_carbon` | When set, biases auto-routing on this key toward lower-grid-intensity providers. |
| `enabled` | Toggle without deleting. |

All of these can be edited from **Dashboard → Keys → key name → Edit**.
Edits take effect on the next request, no caching delay.

## Scoping patterns

A few patterns we see often:

**One key per environment per service.** The most common shape:
`prod-api`, `staging-api`, `local-dev`. Each one is allowlisted to the
models that environment actually uses.

**One key per third-party integration.** If you give a token to a
desktop client (ChatBox, Cline, Claude Code, …), put it in its own
key with a daily limit. The blast radius of a leaked key is then
"yesterday's daily limit" instead of "everything."

**One key per researcher / experiment.** When the same project runs
multiple lines of experiments, separate keys make the dashboard's
top-models chart instantly readable per experiment.

**One key per region requirement.** When a particular workload must
stay in `eu-west`, set the region on the key rather than on every
request. The constraint travels with the key and a misconfigured client
can't accidentally send the request elsewhere.

## Limits and what happens when they're hit

When a request would push a key over its `daily_credit_limit` or
`monthly_credit_limit`, the gateway returns `429 Too Many Requests`
with an explanatory body:

```json
{
  "error": {
    "type": "rate_limited",
    "code": "key_daily_limit_exceeded",
    "message": "API key 'prod-api' has reached its daily credit limit (5.00).",
    "param": null
  }
}
```

The response includes `Retry-After` indicating when the limit resets
(midnight UTC). Bumping a limit takes effect immediately for subsequent
requests.

## Rotation

The rotation pattern that does not require zero-downtime support from
the gateway:

1. **Create** a second key with the same scope.
2. **Deploy** it everywhere the old key was used (config, secret
   manager, CI variables).
3. **Verify** the new key is in use by watching the usage history; the
   old key's request rate should drop to zero.
4. **Delete** the old key.

Aim to rotate at least every 90 days, and immediately after any of
the events listed in [Getting started → API keys](../getting-started/api-keys#rotate-or-revoke).

## Revocation

A revoked key returns `401 Unauthorized` on the next request. There is
no warning, no grace period, no caching delay. Revocation cannot be
undone — if you revoke the wrong key, create a new one.

The dashboard preserves the key's usage history after revocation. The
token itself is discarded.

## Audit trail

Every key creation, edit, rotation, and revocation produces an entry
in **Settings → Audit log**. The log records: who acted, when, on
which key, and what changed. Export as CSV for retention in your own
audit pipeline.


---

# Credits and billing


# Credits and billing

LowRouter is pre-paid. You top up a credit balance, the gateway debits
that balance per request, and the balance is your single source of
truth for spend.

## Credits

A credit is a fractional unit of EUR: 1 credit = €0.01, so €5 of credits
adds 500 credits. Credits and balances are always denominated in EUR, and
checkout is in EUR.

At checkout, a flat payment-processing and platform fee is added to the credit
amount — the same for every payment method and region — and VAT is added on top
where applicable. The all-in price is shown before you pay; every euro of credit
is yours to spend in full once delivered.

Credits do not expire. Refunds for accidental top-ups are handled
case-by-case via the support email on the
[legal page](/impressum) within 14 days.

The current balance is shown on the dashboard, on the credits page,
and in the response of every request via the `X-Credit-Balance`
header.

## What a request costs

The cost of a request is:

```
upstream_provider_price_per_token × tokens
+ platform_fee_per_token × tokens
```

Both components are quoted per 1K tokens, separately for prompt and
completion. The prices visible on the model browser and the model
pages already include the platform fee. The breakdown is also visible
on each transaction's detail page.

A few details worth knowing:

- **Cached prompt tokens** (when an upstream provider supports prompt
  caching) are charged at the upstream's cached rate. The platform fee
  is unchanged.
- **Failed requests** that produced no upstream charge consume zero
  credits. A 4xx from the upstream that did consume tokens (rare) is
  passed through to your bill.
- **Streaming responses** are charged on the same usage numbers as a
  non-streaming response — total tokens, not per-chunk.

## Top up

**Dashboard → Credits → Add credits**, then complete the Stripe-hosted
checkout. Card and SEPA Direct Debit (EU accounts) are supported.

The amount you select is exclusive of VAT. The invoice produced after
payment shows the net amount, the VAT amount, and the gross total. VAT
treatment follows the country and (where applicable) VAT number on
your billing profile.

## Invoices

After every successful top-up, an invoice is generated and emailed.
The same invoices are downloadable as PDF from **Dashboard →
Invoices**.

If you operate on behalf of a company:

1. **Settings → Billing** — set the legal name, billing address, and
   VAT number.
2. Invoices issued from that point onwards carry the company details.
3. Past invoices can be re-issued with the corrected billing block on
   request via support.

## Pricing changes

Upstream provider prices change. We update the prices on the model
browser and in the routing engine within one business day of an
upstream price change going live. The dashboard records the
per-request price at the moment of the request, so historical bills
are stable even when current prices change.

The platform fee is published per model on the model browser. Material
changes to the platform fee are announced at least 30 days in advance
to the email on the account.

## What we don't bill for

- Failed authentication, rate-limited requests, or key-limit hits —
  zero credits.
- Health checks (`HEAD /docs`, `HEAD /api/v1/models`, etc.) — zero
  credits.
- Dashboard browsing, key management, or any control-plane action —
  zero credits.

## Refunds

Refunds for unspent credit balances are not processed automatically.
Contact support if you need to wind down an account; we'll process the
refund of the remaining balance to the original payment method,
subject to a 14-day cooling-off limit on the most recent top-up under
EU consumer law.

## Tax

LowRouter is operated by Carbonifer SAS, a French entity. VAT is
charged at the rate applicable to your billing country. EU B2B
customers with a valid VAT number are subject to reverse charge (no
VAT on the invoice). Non-EU customers receive an invoice without VAT.


---

# Usage accounting


# Usage accounting

Every request through the gateway produces a record. This page is the
schema and the access patterns.

## Per-request record

The fields stored for each request:

| Field | Description |
|-------|-------------|
| `generation_id` | Opaque, globally unique. Returned in the response and used to look up the record later. |
| `created_at` | UTC timestamp when the gateway accepted the request. |
| `completed_at` | UTC timestamp when the response was fully sent (post-streaming for streamed requests). |
| `key_id` | The virtual key used. Names are joined in for display. |
| `requested_model` | The string the caller sent (e.g. `lowrouter/auto`). |
| `resolved_model` | The model the router actually picked. |
| `provider` | Upstream provider that served the request. |
| `region` | Region the upstream served from. |
| `prompt_tokens` | Token count of the input. |
| `completion_tokens` | Token count of the output. |
| `total_tokens` | Sum. |
| `latency_ms` | First-byte latency for streaming, end-to-end for non-streaming. |
| `cost_credits` | Credits debited for this request. |
| `eco.energy_wh` | Estimated energy for the inference, in watt-hours. |
| `eco.carbon_g` | Estimated CO₂e for the request, in grams. |
| `eco.carbon_per_1k_tokens_g` | Same number, normalised per 1K tokens. |
| `eco.accuracy` | `accurate` / `medium` / `gross` confidence band. |
| `eco.methodology_version` | Version of the formula and data inputs that produced these numbers. |
| `status` | `ok`, `client_error`, `upstream_error`. |
| `routing_trace` | Which providers were considered, which one was picked, why. |

Prompt and response *content* are not stored.

## Where to read it

- **Dashboard → Recent transactions** — the last ~50 requests, with
  filtering by date range, model, provider, region, key.
- **Transaction detail page** — full record, accessed by clicking a row
  or by visiting `/dashboard/transactions/{generation_id}`.
- **API**: [`GET /api/v1/generation/{id}`](../api-reference/overview)
  returns the same record as JSON. Useful for programmatic queries,
  reconciliation, or piping into your own data warehouse.
- **Export** — **Dashboard → Recent transactions → Export** produces a
  CSV of the filtered range, capped at 50,000 rows per export.

## Aggregates

The dashboard pre-computes a small set of aggregates and updates them
on each request:

- Tokens per day, per model.
- Cost per day, per model.
- Energy and carbon per day, per model.

These aggregates power the charts. They are derived from the
per-request records and are reproducible from a CSV export.

## Retention

| Data | Retention |
|------|-----------|
| Per-request records | 13 months from `created_at`. |
| Daily aggregates | 36 months. |
| Audit log entries | 36 months. |
| Account profile | For the lifetime of the account. |

After retention expires the per-request rows are deleted and replaced
by anonymised aggregates. Aggregates are kept for sustainability
reporting and platform analytics; they cannot be used to reconstruct
individual requests.

You can request earlier deletion of all your usage records via the
support email; the deletion is irreversible and may affect your
ability to dispute past invoices.

## Reconciliation tips

- The sum of `cost_credits` over a day should equal the daily cost on
  the dashboard within a fraction of a credit (rounding).
- The sum of `total_tokens` over a day grouped by model is what the
  upstream provider's usage report (if you have one) should show.
- The carbon estimate is reproducible: given the same `resolved_model`,
  `region`, and `total_tokens`, recomputing with the formula in
  [methodology](../sustainable-ai/methodology) using the
  `methodology_version` should yield the same gram count.

If the numbers diverge more than rounding allows, that is a bug —
[open an issue](https://github.com/carbonifer/lowrouter/issues).


---

# Dashboard deep-dive


# Dashboard deep-dive

[Dashboard tour](../getting-started/dashboard-tour) is the lap-around.
This page is the per-chart reference: what each one shows, what it's
derived from, and the gotchas.

## Filters

A date-range picker, a model picker, and a key picker sit at the top
of the dashboard. Setting any filter reloads every widget on the page
with the same filter applied. Filters are reflected in the URL so
links are shareable.

The default range is the last 30 days, ending today (UTC). Changing
the range:

- Updates the **Top models** ranking.
- Updates the **Eco impact** comparison.
- Restricts the recent-transactions table.

It does **not** change your credit balance, which is always live.

## Credit balance widget

- **Big number**: current balance.
- **Small number**: spent in the current calendar month.
- The colour of the band underneath is a heuristic: green if your
  current burn-rate would last the full calendar month, amber if not.

The widget pulls live; clicking **Top up** opens the Stripe checkout.

## Usage today widget

- **Top number**: total tokens today (UTC), all models.
- **Bottom number**: total credits spent today.
- **Bars**: per-model breakdown of today's tokens.

Hover any bar for the per-model token and credit total. Click a bar to
filter the rest of the dashboard by that model.

## Top models (last 30 days)

- Horizontal bar chart, ranked by total tokens descending.
- Each bar shows tokens; the value next to the bar shows credits.
- The 30-day window is fixed regardless of the page-level filter so
  the ranking is stable across page loads.

When fewer than five models have non-trivial usage, the chart shows
just the ones with data rather than padding with empty bars.

## Eco impact widget

The widget has three numbers and one comparison:

- **Energy** — total Wh estimated for the filtered range.
- **Carbon** — total gCO₂e estimated for the filtered range.
- **Per 1K tokens** — the normalised value, useful for comparing
  across ranges of different sizes.
- **Comparison** — the same usage replayed against your chosen
  baseline model. If your actual usage was lower-energy than the
  baseline, the widget reports the saving; if higher, it reports the
  gap. Pick or change the baseline from
  [Settings → Eco baseline](/dashboard/settings).

The widget refuses to display a comparison when the underlying
estimates' confidence is too low to be meaningful — the visible number
becomes "—" with a note explaining why.

## Recent transactions table

- Default sort: newest first.
- Sortable columns: timestamp, model, tokens, cost, latency.
- Filterable columns: model, provider, region, status.
- Clicking a row opens its detail page.

## Per-day usage chart

Below the recent-transactions table, a stacked column chart shows
tokens per day, stacked by model, for the filtered range. This is the
chart to use when explaining usage growth or detecting a spike.

The chart respects the page-level model filter — selecting a model up
top filters the chart to that one.

## Per-day cost chart

Same shape as the per-day usage chart but in credits. Use it for
budgeting and burn-rate analysis. The two charts share the same time
axis so they can be compared visually.

## Provider distribution

A donut showing the share of requests served by each upstream
provider in the filtered range. It's the fastest way to confirm a
routing-policy change actually took effect.

When a policy is supposed to keep traffic in a single provider but
the donut shows multiple wedges, check the policy and the per-request
`routing_trace` for the requests that escaped.

## Mobile dashboard

On a phone the dashboard collapses to a single column, the filters move
into a slide-up panel, and the per-day charts become scrollable. The
recent-transactions table becomes a vertical card list. Heavy
operations (large CSV exports, multi-key edits) still want the desktop
view.


---

# Pricing and currency conversion


# Pricing and currency conversion

Most providers we route to publish their per-token prices in USD
rather than EUR. To keep accounting and balances simple, **every
balance and every charge is in EUR**, and we convert non-EUR provider
prices to EUR once a day before they reach the catalogue.

This page explains how that conversion works, why the displayed price
on a USD-billed provider isn't quite the same as the headline USD
figure, and what happens when our FX feed is unavailable.

## How the conversion is computed

For each non-EUR provider price we ingest, the stored EUR value is:

```
stored_eur_per_1m_tokens =
    source_per_1m_tokens
    × max(source_to_eur_rate, 1.0)
    × (1 + fx_buffer_percent / 100)
```

- **`source_per_1m_tokens`** — the price the provider publishes in
  their billing currency (USD for most providers).
- **`source_to_eur_rate`** — the daily rate that converts the
  provider's currency into EUR, derived from the reference rates
  published by the European Central Bank at the
  [eurofxref-daily.xml](https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml)
  endpoint. A **rate floor of `1.0`** is applied: when the EUR is
  stronger than the source currency, the floor pins the conversion at
  parity so that a strong EUR can't quietly erode our markup.
- **`fx_buffer_percent`** — a fixed conversion markup applied on top of
  the ECB rate, defaulting to **3 %**. The markup covers FX-spread
  drift between the day we fetched the rate and the day we settle
  with the provider, plus payment-rail conversion fees.

For example, a provider publishing `$0.15 per 1M tokens` at an ECB
reference rate of `1 EUR = 0.90 USD` (so `1 USD = 1.111 EUR`) and a
3 % markup becomes:

```
0.15 × 1.111 × 1.03 ≈ 0.1717 EUR per 1M tokens
```

When the EUR is instead stronger than the dollar — say `1 EUR =
1.085 USD`, so `1 USD = 0.92 EUR`, which is below the `1.0` floor —
the floor kicks in and the conversion uses `1.0`:

```
0.15 × 1.0 × 1.03 ≈ 0.1545 EUR per 1M tokens
```

That is what your balance is debited for every 1M tokens you spend on
that model.

## Where you see the dual figures

Providers that already bill in EUR (e.g. `/providers/scaleway`)
render a single figure: their published EUR price, shown as-is with
no conversion. Providers that bill in USD (most of them — OpenAI,
Anthropic, Bedrock, Vertex, Groq, …) render both numbers:

```
$0.15/M (€0.17 billed)
```

The first figure is the upstream price you'd see on the provider's
own pricing page. The second is the EUR value your balance is
debited at, computed with the formula above. Hovering the price
shows the ECB rate, the markup, and when the conversion was last
refreshed.

## Refresh cadence

The ECB feed updates on TARGET business days at ~16:00 CET. We pull
it once per ingest run, which runs daily. Saturday and Sunday reuse
Friday's published rate — that's the standard ECB convention.

Your debit at request time uses the most recent stored EUR value.
**This means the price you see in the catalogue today may differ
slightly from what was billed yesterday for the same number of
tokens** — by the size of the FX move plus markup drift.

## When the ECB feed is unavailable

If we cannot reach ECB during an ingest run, we **soft-disable** the
affected providers (the USD-billed ones that need conversion) until
the feed recovers. While
soft-disabled, those providers are visible in the catalogue but won't
accept new requests, and an internal alert (`FX_INGEST_STALE`)
notifies the team.

We deliberately do not fall back to a hard-coded rate. A guessed rate
is worse than visible unavailability — it can either silently
under-charge our margin or over-charge customers, and neither is
something we want to do quietly.

## What we don't claim

- **We don't offer a rate lock.** The price you see today is the
  price we charge today; tomorrow's rate may differ. If you need a
  fixed price, the model's EUR figure is the one we honour at the
  moment of the request, not a quote held in advance.
- **We don't pass through every provider price change in real time.**
  The catalogue reflects the most recent successful ingest, which is
  daily. A provider price change mid-day will land in the next run.
- **We don't bill in any currency other than EUR.** EUR is our single
  operating currency: balances, charges, the price list, and checkout
  are all EUR. Providers that publish in USD are converted once a day
  as described above, which keeps a single ledger and a single set of
  accounting rules.

## Configuration

For self-hosted operators, the conversion markup can be tuned via the
`LOWROUTER_FX_BUFFER_PERCENT` environment variable (default `3.0`,
clamped to `[0, 20]`). The change takes effect on the next ingest
run.


---

# Integrations


# Integrations

LowRouter speaks the OpenAI Chat Completions API. Anything that talks
to OpenAI talks to LowRouter — usually by changing the base URL and
the API key.

The pages in this section are short on purpose. Each one is a working
snippet plus the two or three things that surprise people on first
use.

- [curl](curl) — the bare HTTP request, useful for debugging and as
  the source of truth.
- [OpenAI SDK (Python)](openai-sdk-python)
- [OpenAI SDK (TypeScript / JavaScript)](openai-sdk-typescript)
- [Anthropic SDK](anthropic-sdk)
- [ChatBox](chatbox) — desktop chat client.
- [OpenCode](opencode) — terminal coding assistant.
- [Cline](cline) — VS Code coding agent.
- [Goose](goose) — Block's open-source agent.
- [Claude Code](claude-code) — Anthropic's CLI agent.
- [Generic OpenAI-compatible clients](openai-compatible) — the
  pattern for anything not on this list.

The base URL for every integration is:

```
https://lowrouter.ai/api/v1
```

The auth header is:

```
Authorization: Bearer $LOWROUTER_API_KEY
```

That's the whole deal. The rest is per-tool configuration.


---

# curl


# curl

The simplest way to talk to the gateway. If something works in `curl`
but not in your SDK, the SDK is the thing to debug.

## A non-streaming completion

```bash
curl https://lowrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $LOWROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lowrouter/auto",
    "messages": [
      {"role": "user", "content": "In one sentence, what is a vector database?"}
    ]
  }'
```

The response shape is documented in
[chat completions](../api-reference/chat-completions).

## A streaming completion

Use `-N` to disable curl's output buffering, and set `"stream": true`
in the body:

```bash
curl -N https://lowrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $LOWROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lowrouter/auto",
    "stream": true,
    "messages": [{"role": "user", "content": "Count to 5 slowly"}]
  }'
```

The response is a Server-Sent Events stream; the format is in
[streaming](../api-reference/streaming).

## Listing models

```bash
curl https://lowrouter.ai/api/v1/models \
  -H "Authorization: Bearer $LOWROUTER_API_KEY"
```

Returns the routable models with their per-token prices and basic
metadata. Cache the result locally — it does not change between
requests within a single user session.

## Pinning a provider and region

Override `lowrouter/auto` by sending an explicit model string and a
`route` block:

```bash
curl https://lowrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $LOWROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}],
    "route": {"region": "eu-west", "provider": "openai"}
  }'
```

If no provider in the requested region is available the request
returns 503 rather than falling back silently. See
[routing](../models/routing) for the full set of route options.

## Looking up a generation later

Every response carries a `lowrouter.generation_id`. Pass it to:

```bash
curl https://lowrouter.ai/api/v1/generation/$GENERATION_ID \
  -H "Authorization: Bearer $LOWROUTER_API_KEY"
```

You get the same record the dashboard shows, including the eco
numbers and the routing trace.


---

# OpenAI SDK (Python)


# OpenAI SDK (Python)

The OpenAI SDK is the canonical client. It works with LowRouter
unchanged once you set `base_url` and `api_key`.

## Install

```bash
pip install openai
```

## A non-streaming completion

```python
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://lowrouter.ai/api/v1",
    api_key=os.environ["LOWROUTER_API_KEY"],
)

response = client.chat.completions.create(
    model="lowrouter/auto",
    messages=[
        {"role": "user", "content": "In one sentence, what is a vector database?"}
    ],
)

print(response.choices[0].message.content)
```

## A streaming completion

```python
stream = client.chat.completions.create(
    model="lowrouter/auto",
    messages=[{"role": "user", "content": "Count to 5 slowly"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
```

## Reading the eco metadata

LowRouter's per-request metadata lives outside the OpenAI schema, so
the typed SDK fields don't surface it. Read it from the raw response:

```python
response = client.chat.completions.create(
    model="lowrouter/auto",
    messages=[{"role": "user", "content": "hi"}],
)
extra = response.model_extra or {}
eco = extra.get("lowrouter", {}).get("eco")
if eco:
    print(f"{eco['carbon_per_1k_tokens_g']:.3f} gCO2e/1k tokens "
          f"({eco['accuracy']})")
```

`response.model_extra` is the canonical Pydantic-v2 escape hatch for
non-schema fields. On older SDK versions the attribute is
`response.__pydantic_extra__`.

## Pinning a region with extra_body

The OpenAI SDK passes unknown kwargs through `extra_body`:

```python
response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "hi"}],
    extra_body={"route": {"region": "eu-west"}},
)
```

If you need this on every request, make it a default by wrapping the
client:

```python
def make_client(region="eu-west"):
    base = OpenAI(
        base_url="https://lowrouter.ai/api/v1",
        api_key=os.environ["LOWROUTER_API_KEY"],
        default_headers={"X-LowRouter-Region": region},
    )
    return base
```

`X-LowRouter-Region` is honoured the same as `route.region` in the
body.

## Async

The async client follows the same pattern:

```python
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://lowrouter.ai/api/v1",
    api_key=os.environ["LOWROUTER_API_KEY"],
)

async def main():
    r = await client.chat.completions.create(
        model="lowrouter/auto",
        messages=[{"role": "user", "content": "hi"}],
    )
    print(r.choices[0].message.content)

asyncio.run(main())
```


---

# OpenAI SDK (TypeScript)


# OpenAI SDK (TypeScript)

## Install

```bash
npm install openai
```

## A non-streaming completion

```ts
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://lowrouter.ai/api/v1",
  apiKey: process.env.LOWROUTER_API_KEY,
});

const response = await client.chat.completions.create({
  model: "lowrouter/auto",
  messages: [
    { role: "user", content: "In one sentence, what is a vector database?" },
  ],
});

console.log(response.choices[0].message.content);
```

## A streaming completion

```ts
const stream = await client.chat.completions.create({
  model: "lowrouter/auto",
  stream: true,
  messages: [{ role: "user", content: "Count to 5 slowly" }],
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}
```

## Reading the eco metadata

The TypeScript types do not include LowRouter's extra fields. Cast or
narrow when you read them:

```ts
type LowRouterMeta = {
  generation_id: string;
  provider: string;
  region: string;
  eco?: {
    energy_wh: number;
    carbon_g: number;
    carbon_per_1k_tokens_g: number;
    accuracy: "accurate" | "medium" | "gross";
  };
};

const r = await client.chat.completions.create({ /* ... */ });
const meta = (r as unknown as { lowrouter?: LowRouterMeta }).lowrouter;
if (meta?.eco) {
  console.log(
    `${meta.eco.carbon_per_1k_tokens_g.toFixed(3)} gCO2e/1k (${meta.eco.accuracy})`,
  );
}
```

## Pinning a region

The SDK forwards unknown fields:

```ts
const response = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "hi" }],
  // @ts-expect-error: extra fields not in the OpenAI schema
  route: { region: "eu-west" },
});
```

Or, if you prefer not to silence the type error, send the field as a
header:

```ts
const client = new OpenAI({
  baseURL: "https://lowrouter.ai/api/v1",
  apiKey: process.env.LOWROUTER_API_KEY,
  defaultHeaders: { "X-LowRouter-Region": "eu-west" },
});
```

## Browser usage

The OpenAI SDK warns against running with an API key in the browser
because the key is then exposed to every page visitor. The same
applies to LowRouter: keep your `LOWROUTER_API_KEY` server-side and
proxy requests from a backend you control. If you need a signed,
short-lived token for a browser client, server-side endpoint that
mints one is the right shape.


---

# Anthropic SDK


# Anthropic SDK

The gateway accepts Anthropic-shaped requests on a separate path so
the official `anthropic` SDK works without a custom adapter. Use this
when your codebase is already standardised on Anthropic's
`messages.create()` style.

## Python

```bash
pip install anthropic
```

```python
import os
from anthropic import Anthropic

client = Anthropic(
    base_url="https://lowrouter.ai/api/v1/anthropic",
    api_key=os.environ["LOWROUTER_API_KEY"],
)

message = client.messages.create(
    model="anthropic/claude-sonnet-4-5",
    max_tokens=512,
    messages=[
        {"role": "user", "content": "In one sentence, what is a vector database?"}
    ],
)

print(message.content[0].text)
```

## TypeScript

```bash
npm install @anthropic-ai/sdk
```

```ts
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://lowrouter.ai/api/v1/anthropic",
  apiKey: process.env.LOWROUTER_API_KEY,
});

const message = await client.messages.create({
  model: "anthropic/claude-sonnet-4-5",
  max_tokens: 512,
  messages: [{ role: "user", content: "What's a vector database?" }],
});

console.log(message.content);
```

## Notes

- The model string is the LowRouter ID
  (`anthropic/claude-sonnet-4-5`), not Anthropic's bare model name.
- `api_key` is your LowRouter key, not an Anthropic API key.
- The `anthropic-version` header is set automatically by the SDK; the
  gateway accepts what the SDK sends.
- Streaming works as in the official SDK (`client.messages.stream(...)`).
- Eco metadata is appended to the response in a `lowrouter` field at
  the top level, the same shape as in the OpenAI-compatible path.

## When to prefer this over the OpenAI path

- Your codebase already uses Anthropic types end-to-end and you don't
  want to rewrite call-sites.
- You depend on Anthropic-specific request features (system prompt
  caching, citations, computer-use tool blocks) that aren't in the
  OpenAI schema.
- You want to keep tool definitions in Anthropic's `tools` shape.

If neither of these applies, the OpenAI-compatible path is simpler:
one set of types, one base URL, every model on the same endpoint.


---

# ChatBox


# ChatBox

[ChatBox](https://chatboxai.app/) is an open-source desktop chat
client. Configure it to use LowRouter as a custom OpenAI provider.

## Configure

1. Open **Settings → Model Provider**.
2. Click **Add custom provider**.
3. Fill in:
   - **Name**: `LowRouter`
   - **API Mode**: `OpenAI API Compatible`
   - **API Host**: `https://lowrouter.ai/api/v1`
   - **API Path**: `/chat/completions`
   - **API Key**: your `lr-sk-...` token
4. Save.

## Pick a model

Under **Model**, choose **Custom model name** and enter a LowRouter
model ID — `lowrouter/auto`, `openai/gpt-4o`,
`anthropic/claude-sonnet-4-5`, etc. The full list is on the
[model browser](/models).

## Recommended setup

- **Use a key dedicated to ChatBox.** Keep its `daily_credit_limit`
  small (e.g. 1 credit/day for a personal account). A leaked key
  bounded to "yesterday's daily limit" is much less painful than a
  leaked production key.
- **Disable telemetry on the ChatBox side** if you care about not
  exposing prompt content to the ChatBox publisher's analytics.
  ChatBox itself does not see prompts in normal operation, but
  features like crash reporting can capture context.
- **Stream replies on**. The desktop UX expects streaming and
  LowRouter supports it the same way OpenAI does.

## Troubleshooting

- **401 / Unauthorized**: confirm the API key starts with `lr-sk-`
  and has not been revoked. Test with `curl` from
  [the curl page](curl) using the same key.
- **404 / Not Found**: the **API Path** must be `/chat/completions`.
  Some ChatBox versions default to `/v1/chat/completions`, which
  becomes `https://lowrouter.ai/api/v1/v1/chat/completions` — drop
  the leading `/v1/`.
- **Model not available**: look up the exact model ID on the
  [model browser](/models). Auto-complete in ChatBox is not always
  accurate.


---

# OpenCode


# OpenCode

[OpenCode](https://opencode.ai/) is a terminal coding assistant. It
expects an OpenAI-compatible endpoint, which is what LowRouter
exposes.

## Configure

OpenCode reads its config from `~/.config/opencode/opencode.json`.
Point the OpenAI provider entry at LowRouter:

```json
{
  "providers": {
    "openai": {
      "baseURL": "https://lowrouter.ai/api/v1",
      "apiKey": "lr-sk-..."
    }
  },
  "defaultModel": "lowrouter/auto"
}
```

Restart OpenCode after editing the file.

## Picking a model

Inside OpenCode, run `/model` and pick from the list. If a model isn't
listed, type the LowRouter model ID directly — any model on the
[model browser](/models) is routable.

For coding tasks, the auto route (`lowrouter/auto`) generally picks
something appropriate. If you want a specific model:

- For long contexts: a model with ≥128K context window. The model
  browser tags context length per model.
- For latency-sensitive iteration: an `*-mini` or `*-haiku-*` variant.
- For careful reasoning: a top-tier reasoning model.

## Recommended setup

- **Dedicated key with a daily limit.** OpenCode is interactive and
  it's easy to lose track of how many tokens you spent in an
  afternoon. A daily limit on the key bounds the surprise.
- **Disable shell-execution tools by default.** OpenCode supports
  letting the model run shell commands; turn that off until you've
  reviewed the prompts the agent sends. Enable it per-session for the
  workflow that needs it.
- **Stream on.** Default in OpenCode; mentioned for completeness.

## Troubleshooting

- **Hangs on the first request**: confirm `baseURL` ends with `/v1`
  (no trailing slash). OpenCode appends `/chat/completions` itself.
- **Model "not found"**: the model isn't in OpenCode's autocomplete
  list, but it is routable. Run `/model lowrouter/auto` to confirm
  the gateway is reachable, then use the explicit model ID.


---

# Cline


# Cline

[Cline](https://cline.bot/) is a VS Code extension that runs a coding
agent against an LLM provider. It supports any OpenAI-compatible
endpoint.

## Configure

1. Install the **Cline** extension from the VS Code marketplace.
2. Open the Cline sidebar.
3. Click the gear icon, then **Settings**.
4. Under **API Provider**, pick **OpenAI Compatible**.
5. Fill in:
   - **Base URL**: `https://lowrouter.ai/api/v1`
   - **API Key**: your `lr-sk-...` token
   - **Model ID**: any LowRouter model ID, e.g.
     `anthropic/claude-sonnet-4-5` or `lowrouter/auto`.
6. Save and start a task.

## Recommended setup

- **Separate key for Cline**. Cline can burn tokens fast on agentic
  tasks (read file → think → edit → re-read). A dedicated key with a
  per-day limit is a cheap insurance policy.
- **Pick an explicit model**, not `lowrouter/auto`, for repeatable
  pricing. Auto-routing changes the underlying model based on
  availability, which can surprise you when you compare daily costs.
- **Read the diff every time.** Cline produces real edits to your
  workspace. The dashboard's transaction-detail page shows exactly
  what was sent (token counts, model, cost) but not the prompt or the
  response — the source of truth is the diff in your editor.

## Troubleshooting

- **"Model does not support tools"**: not every model exposes tool
  use. The model browser tags `tool_use: true` on supported models.
  Pick one that does, or switch to `lowrouter/auto` which prefers
  tool-capable models when the prompt calls for tools.
- **"Context window exceeded"**: the file or selection you fed the
  agent is larger than the model's context. Switch to a longer-context
  model or trim the context.
- **401**: confirm the API key.
- **Latency feels off**: check the **provider** field on the
  transaction detail page. Cline doesn't expose it; LowRouter does.


---

# Goose


# Goose

[Goose](https://block.github.io/goose/) is Block's open-source agent.
It supports OpenAI-compatible providers via configuration.

## Configure

Edit `~/.config/goose/config.yaml`:

```yaml
GOOSE_PROVIDER: openai
OPENAI_HOST: https://lowrouter.ai/api/v1
OPENAI_API_KEY: lr-sk-...
GOOSE_MODEL: lowrouter/auto
```

Or set them as environment variables before starting Goose:

```bash
export GOOSE_PROVIDER=openai
export OPENAI_HOST=https://lowrouter.ai/api/v1
export OPENAI_API_KEY=lr-sk-...
export GOOSE_MODEL=lowrouter/auto
goose session
```

## Picking a model

Set `GOOSE_MODEL` to any LowRouter model ID. For agentic tasks
(file reading, shell tools, multi-step reasoning), pick a model
tagged with `tool_use: true` on the [model browser](/models).

## Recommended setup

- **Dedicated key, daily limit.** Same reasoning as the other
  agents: agentic loops can run away.
- **Limit the toolset Goose has access to.** Goose's `extensions`
  config lets you allow only the tools the workflow needs. Fewer
  enabled tools = fewer surprises.
- **Set a step limit.** Goose has a max-step setting; cap it at a
  small number for unattended runs.

## Troubleshooting

- **Goose immediately exits with a config error**: `OPENAI_HOST` does
  not include a trailing slash. Match the value above exactly.
- **Tool calls fail silently**: verify the chosen model actually
  supports tool use (model browser, `tool_use: true`). Some smaller
  models don't.


---

# Claude Code


# Claude Code

[Claude Code](https://www.anthropic.com/claude-code) is Anthropic's
CLI agent. It expects an Anthropic API endpoint, which LowRouter
exposes via the `/api/v1/anthropic` prefix.

## Configure

Set environment variables before starting Claude Code:

```bash
export ANTHROPIC_BASE_URL=https://lowrouter.ai/api/v1/anthropic
export ANTHROPIC_API_KEY=lr-sk-...
claude
```

Or persist them in your shell profile (`~/.zshrc`, `~/.bashrc`).

## Picking a model

Claude Code uses the model defined in its settings. Edit
`~/.claude/settings.json` to pin a LowRouter-routed Claude model:

```json
{
  "model": "anthropic/claude-sonnet-4-5"
}
```

The model string is the LowRouter ID (`anthropic/claude-sonnet-4-5`),
not Anthropic's short name. The full list of supported Claude models
is on the [model browser](/models) — filter by provider `anthropic`.

## Recommended setup

- **Dedicated key.** Same daily-limit advice as the other agents.
- **EU residency**: pin the key's region to `eu-west` so Claude
  requests are served from EU endpoints. This survives IDE restarts
  and machine swaps without per-session config.
- **Audit usage on the dashboard.** The transaction detail page shows
  token counts and cost per generation, which Claude Code itself does
  not surface in real time.

## Troubleshooting

- **Claude Code can't reach the API**: the base URL must end with
  `/anthropic` (no trailing slash). Claude Code appends `/v1/messages`
  internally.
- **Model "not found"**: confirm the model exists on the
  [model browser](/models). Anthropic adds and deprecates models
  faster than the model browser updates; a 404 usually means the model
  is no longer routable.
- **The agent stalls during streaming**: this is sometimes a
  network-buffering issue between Claude Code and LowRouter. Setting
  `CLAUDE_CODE_STREAM_BUFFER=0` (if the version you run supports it)
  disables the client-side buffer.


---

# Generic OpenAI-compatible clients


# Generic OpenAI-compatible clients

If a tool isn't on this list, it almost certainly works as long as it
exposes two settings: **base URL** and **API key**. The pattern below
is what to fill in.

## Settings to set

| Setting | Value |
|---------|-------|
| Provider type | OpenAI Compatible (sometimes "Custom OpenAI" or "OpenAI API") |
| Base URL | `https://lowrouter.ai/api/v1` |
| API Key | your `lr-sk-...` token |
| Path / endpoint | `/chat/completions` (most tools handle this automatically) |
| Model | any LowRouter model ID — `lowrouter/auto`, `openai/gpt-4o-mini`, … |

## What does *not* work

- **Tools that hard-code `https://api.openai.com`** without a
  base-URL setting cannot be redirected. Some have a `OPENAI_API_BASE`
  environment variable that achieves the same thing.
- **Tools that require a specific model ID format** (e.g. `gpt-4` with
  no provider prefix) need the model picker reconfigured to accept
  arbitrary strings — most have a "custom model name" field.
- **Tools that send Anthropic-shaped requests on the OpenAI endpoint**
  will be rejected. Use the
  [Anthropic SDK base URL](anthropic-sdk) instead.

## Confirming it works

Before integrating, test from the command line that the tool's
settings are right:

```bash
curl https://lowrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"lowrouter/auto","messages":[{"role":"user","content":"hi"}]}'
```

If `curl` returns a completion, the tool will too — once it's
configured with the same URL and key.

## Tools we know work without changes

- **LangChain** (`OpenAI` and `ChatOpenAI` classes — set
  `openai_api_base` and `openai_api_key`).
- **LlamaIndex** (`OpenAI` and `OpenAILike` LLMs).
- **LiteLLM** (proxy and library — set `api_base` and `api_key`).
- **Vercel AI SDK** (`createOpenAI` from `@ai-sdk/openai-compatible`).
- **Continue.dev** (`provider: openai-aiohttp` with `apiBase`).
- **LM Studio** (Server tab → custom backend).

## When to prefer the OpenAI path over a tool's native provider

If a tool has both an "OpenAI" and an "LowRouter / OpenRouter / custom
gateway" option, prefer the OpenAI Compatible one. It exposes the
fewest surprises: the tool sends a standard chat-completions request,
LowRouter resolves the route, and the response is in the shape the
tool already expects.


---

# Models & providers


# Models & providers

Three short pages on the routing layer:

- [Available models](available) — what's on the platform, how to
  read the model browser, and the IDs you'll use in requests.
- [Routing](routing) — what `lowrouter/auto` does, how ties are
  broken, and what overrides do what.
- [Per-request metadata](per-request-metadata) — the `lowrouter`
  block on every response, field by field.

The dashboard's [model browser](/models) is the live, searchable view
of the same data.


---

# Available models


# Available models

The full catalogue lives on the [model browser](/models). It's
generated from the same data the API exposes at
[`GET /models`](../api-reference/models-and-providers), so the two
agree by construction.

## How model IDs are formed

```
<provider>/<model>
```

Examples:

- `openai/gpt-4o-mini`
- `anthropic/claude-sonnet-4-5`
- `mistral/mistral-large-latest`

The `<provider>` segment matches the `id` of an entry in
[`GET /providers`](../api-reference/models-and-providers); the
`<model>` segment is the upstream model name.

## What each model card shows

- **Display name** — the human-readable name, sometimes versioned.
- **Provider** and **owner** — who serves it and who created it (these
  differ for re-hosted models, e.g. Llama on Mistral).
- **Context window** — max input tokens.
- **Pricing** — prompt, completion, and (where applicable) cached
  prompt rates per 1K tokens, in your account currency.
- **Capabilities** — `tool_use`, `vision`, `structured_output`,
  `streaming`. Filter the catalogue by these.
- **Eco data** — active parameter count and the energy estimate per
  1K tokens. Both numbers come from the
  [methodology](../sustainable-ai/methodology). The confidence band
  (`accurate`, `medium`, `gross`) reflects how well-sourced the
  parameter count is.
- **Regions** — where the upstream serves it (`eu-west`, `us-east`,
  …).

## Pseudo-models

Two model strings are not actual models but routing primitives:

- **`lowrouter/auto`** — pick a model based on the request and the
  current routing policy. See [routing](routing).
- **`lowrouter/auto-cheap`** — auto-route biased toward the cheapest
  model that can plausibly handle the request. Useful for
  high-volume, low-importance work (classification, simple summaries).

When a pseudo-model is used, the response's `model` field is the
*resolved* model.

## Lifecycle

- **Added.** When an upstream releases a new model and we integrate
  it, it appears on the model browser. Brand-new models start with a
  `medium` or `gross` eco confidence band until the parameter count is
  verified.
- **Deprecated.** When an upstream announces deprecation, the model
  card flags it with a `deprecated` badge and a sunset date. Routing
  still uses it until the sunset date.
- **Removed.** After the sunset date, requests for the model return
  `model_deprecated`. A migration suggestion is included in the error
  body when we have one.

## Filtering the catalogue

The model browser supports filtering by:

- Provider
- Context window
- Capability flags
- Eco confidence band
- Price range

The same filters are reflected in `GET /models` query parameters; see
the [discovery endpoints](../api-reference/models-and-providers).


---

# Routing


# Routing

Every request goes through the router. For an explicit model the
router's job is small (pick a healthy upstream that serves it); for
`lowrouter/auto` the router picks the model too. This page describes
both.

## The router's inputs

For each request:

- The model string in the request (`lowrouter/auto`, an explicit ID,
  or one of the auto-* pseudo-models).
- The `route` object, if present (`provider`, `region`,
  `prefer_low_carbon`, `fallback`).
- The key's policy: any of the per-key fields from
  [API key management](../guides/api-keys).
- The current health of every upstream (`ok`, `degraded`,
  `unavailable`).
- The current grid carbon intensity for each (provider, region) pair.

## Decision order

The router applies constraints from the most specific to the least:

1. **Per-request `route`.** A pinned `provider` or `region` removes
   anything that doesn't match.
2. **Per-key policy.** A key's `region` pin or `models` allowlist is
   then applied.
3. **Account policy.** Defaults set in
   [auto-routing settings](/dashboard/auto-routing) — for instance,
   "prefer EU regions when possible."
4. **Auto-router scoring.** Whatever survives the above is scored on:
   - Capability match (does the model support the request shape —
     vision input, tool use, structured output?).
   - Provider health.
   - Latency (median over the last 5 minutes per upstream).
   - Carbon (grams per 1K tokens for that provider × region pair, with
     the bias controlled by `prefer_low_carbon`).
   - Price (matters when the request used `auto-cheap`).
5. **Tie-break.** When two candidates score within 1% of each other,
   the more recently used one wins (sticky routing within a session
   when `user` is supplied; otherwise random).

## What happens on failure

If the chosen upstream returns a 5xx or times out:

- The router marks that upstream's slot temporarily unavailable
  (decaying over a few minutes).
- It tries the next eligible candidate **in the same region** (region
  is never violated silently).
- If none, it tries other regions **only if** `route.fallback != false`
  and the per-key/account policy allows it.
- If still none, it returns `503 service_unavailable` with a code
  describing what's missing.

The full chain of attempts is recorded in the generation's
`routing_trace`, visible on the dashboard's transaction-detail page.

## `prefer_low_carbon`

The auto-router's carbon score is a weighted term in its overall
score. Setting `prefer_low_carbon: true` on a request increases that
weight, which pushes traffic toward providers serving from
lower-grid-intensity regions when capability and latency are
comparable.

It does **not** override pinned regions or providers. It does **not**
guarantee the lowest-carbon option in absolute terms — only that, all
else equal, lower carbon wins.

## Worked example

A request for `lowrouter/auto` with a vision input:

1. Drop models that don't support vision.
2. Drop providers in `degraded` or `unavailable` state.
3. Among the rest, score on (capability fit, latency, carbon).
4. The top-scored option wins.

If the top-scored option later returns 502 mid-request:

1. Mark its `(provider, region)` slot unavailable.
2. Re-score the surviving candidates.
3. Retry with the new top option (capped at two retries per request).
4. If retries are exhausted, return 502 to the caller.

## Pinning recipes

| Goal | Recipe |
|------|--------|
| EU residency | Set `route.region: eu-west` per request, or pin the key's region. |
| Specific provider | `route.provider: anthropic`. Combine with `route.region` for region too. |
| Hard pin (no failover) | `route.provider`, `route.region`, `route.fallback: false`. |
| Lower carbon | `route.prefer_low_carbon: true`. Combine with no region pin to let the router pick the cleanest available region. |
| Cheapest acceptable | Use `lowrouter/auto-cheap`. |

## What the router does not do

- It does not benchmark output quality. The auto router optimises for
  capability, latency, carbon, and price — not "is the answer good".
- It does not silently swap models mid-conversation. If you've been
  routed to model A on the first turn, the auto router prefers
  sticking with model A on the second when you supply a stable
  `user`.


---

# Per-request metadata


# Per-request metadata

Every successful response from the gateway carries a top-level
`lowrouter` field:

```json
{
  "id": "chatcmpl-...",
  "choices": [...],
  "usage": {...},
  "lowrouter": {
    "generation_id": "gen_01J9...",
    "provider": "openai",
    "region": "eu-west",
    "eco": {
      "energy_wh": 0.0021,
      "carbon_g": 0.00057,
      "carbon_per_1k_tokens_g": 0.013,
      "accuracy": "accurate",
      "methodology_version": "v0.4-2026-01"
    }
  }
}
```

The same fields are echoed in HTTP headers (`X-LowRouter-Generation-ID`,
`X-LowRouter-Provider`, `X-LowRouter-Region`) for clients that prefer
header inspection.

## Field reference

### `generation_id`

Globally unique, opaque ID for this generation. Use it to:

- Look up the full record via
  [`GET /generation/{id}`](../api-reference/models-and-providers).
- Open the corresponding row on the dashboard's
  [transactions page](/dashboard).
- Correlate gateway logs with your application logs.

The ID format may evolve; always treat it as opaque.

### `provider`

The upstream that actually served the request. Matches an `id` in
[`GET /providers`](../api-reference/models-and-providers).

### `region`

The region the upstream served from. Strings like `eu-west`,
`us-east`, `us-west` — the same values used in the `route.region`
field on requests.

### `eco`

The energy and carbon estimate. Five fields:

- **`energy_wh`** — total energy estimated for the request, in
  watt-hours. Computed from the resolved model's active parameter
  count and the request's `total_tokens`.
- **`carbon_g`** — total CO₂e estimated for the request, in grams.
  `energy_wh × grid_intensity_for(provider, region) / 1000`.
- **`carbon_per_1k_tokens_g`** — `carbon_g` normalised per 1K total
  tokens. Comparable across requests of different sizes.
- **`accuracy`** — confidence band: `accurate`, `medium`, or `gross`.
  Reflects how well-sourced the model's parameter count is. See
  [methodology](../sustainable-ai/methodology).
- **`methodology_version`** — version string that uniquely identifies
  the formula coefficients and data inputs used. Stable for as long
  as the methodology is unchanged.

### When `eco` is absent

The `eco` field can be missing when:

- The resolved model's parameter count is unknown and we'd rather omit
  the number than fabricate one.
- The request was a non-completion (e.g. a tool-only response with no
  tokens consumed).
- The upstream returned an error mid-stream that prevented usage
  accounting.

When it's missing, the dashboard shows the row with a `—` for the
carbon column and a note linking to the methodology page.

## Streaming

The same metadata arrives at end-of-stream as a `lowrouter.summary`
chunk; see [streaming](../api-reference/streaming) for the exact
shape.

## Privacy

The `lowrouter` block contains nothing about prompt or response
content — only the resolved route and the metric estimates. It is safe
to log on the client side; we do.


---

# Sustainable AI


# Sustainable AI

Four pages that document the energy and carbon numbers shown
elsewhere on the platform:

- [Methodology](methodology) — the formula, the coefficients, and
  the confidence bands.
- [Data sources](data-sources) — where the parameter counts and grid
  intensities come from.
- [Limits and what we don't claim](limits) — the explicit
  out-of-scope list.
- [Reduce your footprint](reduce-your-footprint) — concrete things to
  change in your application that move the dashboard's numbers.

These pages are the longest in the docs site on purpose. The numbers
are only useful with the caveats; the caveats need to be readable.


---

# Methodology


# Methodology

LowRouter estimates the carbon footprint of every inference request
using the formula and data sources described on this page. This is
the reference document; the numbers on the dashboard, the model
browser, and the API responses all come from it.

## What we report

Two numbers per request:

- **Energy** in watt-hours (Wh).
- **Carbon** in grams of CO₂ equivalent (gCO₂e).

The carbon number is also normalised to **gCO₂e per 1,000 tokens** so
requests of different sizes are comparable.

## The formula

```
energy_wh   = ((α × P_active) + β) × tokens × 1000
carbon_g    = energy_wh × grid_intensity_g_per_kwh / 1000
```

Where:

- **`P_active`** — number of active parameters during inference, in
  billions. For dense models this is the parameter count; for
  Mixture-of-Experts (MoE) models it's the parameters activated per
  token, not the total count.
- **`α`** = 8.91 × 10⁻⁵ kWh per output-token-billion-param.
- **`β`** = 1.43 × 10⁻³ kWh constant overhead per output token.
- **`tokens`** — total tokens for the request (`prompt_tokens +
  completion_tokens`).
- **`grid_intensity_g_per_kwh`** — annual-average carbon intensity of
  the electricity grid in the region serving the request.

The energy formula is the
[EcoLogits v0.4 inference model](https://ecologits.ai/0.4/methodology/llm_inference/).
The grid-intensity values come from the International Energy Agency.

## Why this formula

The EcoLogits model is published, peer-reviewed in spirit if not
fully formally, and reproducible from public model parameter counts.
It is not the only credible estimate but it is the one with the
clearest derivation and the most active maintenance. Adopting it lets
us compare numbers across providers using the same yardstick rather
than reconciling each provider's bespoke estimate.

## Confidence bands

Every estimate carries one of three labels:

| Band | When | Expected error |
|------|------|----------------|
| `accurate` | Model size verified by the provider or in the EcoLogits registry; recent grid data. | ±20% |
| `medium` | Model size from a credible third party (research paper, well-supported leak); grid data current. | ±40% |
| `gross` | Model size estimated from the model name or industry rumour; or grid data older than 12 months. | ±60% or more |

These bands are about *uncertainty in the inputs*, not about whether
the formula itself is right. The formula has its own model-class
limits documented on the [limits page](limits).

When the band is `gross`, the dashboard widgets that aggregate carbon
across many requests show a reduced-confidence indicator and link
back to this page.

## Methodology versioning

Every estimate stores the `methodology_version` that produced it
(see [per-request metadata](../models/per-request-metadata)). The
version captures:

- The values of α and β.
- The IEA grid-intensity dataset year.
- The model parameter-count dataset version.

When any of these change, the version is bumped and the change is
noted in the dashboard's footer with the date. Old generations are
*not* retroactively recomputed — their `methodology_version` is the
one in effect when the request was served.

## Worked example

A request:

- Resolved model: `openai/gpt-4o-mini`.
- Active parameters: 8B (this is the value we use; the provider has
  not officially confirmed it, so the band is `medium`).
- Total tokens: 200.
- Provider region: `eu-west`.
- Grid intensity: ~340 gCO₂e/kWh (IEA EU average).

Energy:

```
energy_wh = ((8.91e-5 × 8) + 1.43e-3) × 200 × 1000
          = (7.13e-4 + 1.43e-3) × 200 × 1000
          = 2.143e-3 × 200 × 1000
          = 0.4286 Wh
```

Wait — this needs careful unit handling. The EcoLogits formula's α
is per-token, β is per-token; we multiply by total tokens to get the
total energy in kWh, then convert.

Re-doing with explicit units:

```
energy_per_token_kwh = (8.91e-5 × 8) + 1.43e-3   = 0.002143 kWh/token
energy_kwh           = 0.002143 × 200            = 0.4286 kWh   <-- too high
```

The EcoLogits coefficients in the published v0.4 are in **watt-hours
per output token**, not kWh. The formula as we apply it is:

```
energy_wh_per_token  = (α × P_active) + β
                     = (8.91e-5 × 8) + 1.43e-3   = 0.002143 Wh/token
energy_wh            = 0.002143 × 200            = 0.43 Wh
energy_kwh           = 0.43 / 1000               = 4.3e-4 kWh
carbon_g             = 4.3e-4 × 340              = 0.146 g
carbon_per_1k_tokens = 0.146 × (1000 / 200)      = 0.73 g
```

So a 200-token completion on `gpt-4o-mini` from `eu-west` is
estimated at **0.43 Wh** and **~0.15 gCO₂e**, with `medium`
confidence. These are the numbers your `eco` block would carry.

If you find a discrepancy between this worked example and what the
gateway returns, the gateway is the source of truth — please file an
issue so we can fix the documentation.

## The full picture

Read the [data sources](data-sources) page next for where each number
in the formula comes from. The
[limits](limits) page lists what we explicitly do not claim.


---

# Data sources


# Data sources

The carbon estimate is only as good as its inputs. This page lists
each input, where it comes from, and how often we update it.

## Energy formula coefficients (α, β)

- **Source**:
  [EcoLogits v0.4 — LLM inference methodology](https://ecologits.ai/0.4/methodology/llm_inference/).
- **Values**: α = 8.91 × 10⁻⁵, β = 1.43 × 10⁻³ (Wh per output token).
- **Updates**: when EcoLogits publishes a new methodology version
  with new coefficients, we evaluate it, bump the
  `methodology_version`, and note the change in the dashboard's
  footer with the effective date.

The coefficients were derived from a regression across published
benchmarks on a fleet of representative GPUs. They are an *average*;
real hardware varies.

## Model active parameters

- **Source priority**:
  1. **EcoLogits registry** — models with verified architecture
     details (`accurate`).
  2. **Provider documentation** — values published by the model
     creator (`accurate` or `medium`, depending on whether the
     statement is unambiguous).
  3. **Research papers and credible leaks** — peer-reviewed
     architecture descriptions, technical reports (`medium`).
  4. **Name-based estimates** — `llama-70b` → 70B (`gross`).
- **Updates**: when a new model lands, we look up its parameter count
  in this priority order and tag the `accuracy` band accordingly.
  Re-evaluation happens monthly and on demand when a model's source
  upgrades.

For Mixture-of-Experts models we use the **active parameter count**
(parameters used per token), not the total parameter count. This
distinction matters: a 600B-parameter MoE that activates 20B per
token has the energy profile of a 20B dense model, not a 600B one.

## Grid carbon intensity

- **Source**:
  [International Energy Agency](https://www.iea.org/data-and-statistics)
  electricity statistics — annual averages by country.
- **Aggregation**: where a region maps to multiple countries (e.g.
  `eu-west` covers FR, DE, NL, IE), we use a population-weighted
  average for the region.
- **Updates**: annually, when the IEA publishes the new dataset.
  Switching dataset versions bumps the `methodology_version`.

We do not use real-time grid carbon intensity (which would require
per-request lookups against a service like ElectricityMap). It's on
the roadmap; the trade-off is that real-time numbers introduce
sampling noise we'd need to explain. Annual averages are coarse but
boring, and "boring" is a feature in a methodology document.

### Sample values

| Region | Approx. gCO₂e/kWh | Notes |
|--------|-------------------|-------|
| `eu-west` | ~280–340 | Population-weighted Western Europe average. |
| `eu-north` | ~50–80 | Mostly hydro/nuclear (Sweden, Norway, Finland). |
| `us-west` | ~250–320 | California heavy renewables, broader West mixed. |
| `us-east` | ~370–450 | Higher fossil share. |
| `india` | ~700–800 | Coal-dominant grid. |

Specific values per region are in the dashboard's settings page; the
table above is for orientation.

## Pricing data

- **Source**: each upstream provider's published price list,
  refreshed daily.
- **Updates**: within one business day of an upstream change going
  live.
- **Storage**: the price applied at the moment of a request is
  stored on the generation record, so historical bills are stable.

Pricing isn't strictly part of the carbon methodology, but it is part
of the per-request decision (`auto-cheap` and tie-break heuristics)
so the source is documented here for completeness.

## What we deliberately don't include

- **Hardware embodied carbon.** Manufacturing emissions for the GPUs
  serving inference are non-zero but we don't have a defensible
  per-token allocation. Until we do, omitting the number is more
  honest than guessing.
- **Cooling overhead.** Data-centre cooling adds 10–30% to the energy
  used by compute (Power Usage Effectiveness, PUE). The EcoLogits
  formula incorporates an average overhead; provider-specific PUE
  refinements are pending more data.
- **Network transport.** Energy used to move bytes between the
  gateway, the upstream, and the user is small relative to inference
  and is not counted.
- **Training emissions.** Documented separately on the
  [limits page](limits).


---

# Limits and what we don't claim


# Limits and what we don't claim

A defensible number needs a clear scope. This page is the scope.

## What the numbers cover

- **Inference compute energy** for the model that served the request,
  using the EcoLogits v0.4 formula and the active-parameter count.
- **Grid carbon intensity** of the region the upstream served from,
  using IEA annual averages.
- **One step of the response.** A single request, end-to-end.

That's it.

## What the numbers do not cover

### Training emissions

We report inference only. Training a frontier model has a much
larger and harder-to-attribute footprint, and folding a "share of
training" into per-request numbers depends on assumptions (how many
requests will the model serve in its lifetime?) that are
unverifiable. We'd rather under-report inclusively than make up a
number.

### Hardware manufacturing

GPUs have an embodied carbon footprint from manufacturing. There is
not yet a defensible way to allocate it per token. Some
methodologies amortise it across the GPU's expected lifetime; the
amortisation depends on assumptions we don't have.

### Real-time grid mix

We use annual averages by region. Live carbon-aware routing — picking
the region whose grid is currently cleanest — is a roadmap feature,
not a current one.

### Embedding workloads

The EcoLogits formula was derived for decoder-only autoregressive
models. Encoder-only embedding models have a meaningfully different
compute profile. We currently *do not* report eco numbers for
embedding requests; the response carries no `eco` block. Modelling
embeddings properly is on the roadmap.

### Tool-call orchestration

When a single user-facing operation requires multiple LLM calls
(e.g. an agent that thinks-then-acts-then-thinks), each call gets
its own number. The aggregate footprint of a multi-step operation is
the sum of those numbers. We do not do that aggregation
automatically; that is your application's job.

### Browser, mobile, and on-device inference

Inference that doesn't go through the gateway doesn't appear in the
dashboard. The numbers describe what we measure, not the totality of
your AI footprint.

## What the numbers are *estimates of*

- An estimate, not a measurement. We do not have a wattmeter on the
  upstream provider's GPU.
- A model-class estimate, not a per-request measurement. Two requests
  for the same model with the same token count get the same number.
- An average over hardware, not a number specific to the GPU
  generation that served your request. Newer hardware is generally
  more efficient; the formula does not yet reflect generation.

## What we explicitly will not say

- "X grams CO₂e saved by using LowRouter."
  - We don't know what your counterfactual is. The eco-impact widget
    on the dashboard offers a comparison against a *baseline you
    choose*. That comparison is what it claims to be — a comparison
    against your chosen baseline.
- "Carbon-neutral", "net-zero", or "sustainable" applied to any
  individual request.
  - The numbers we publish exist in service of *better-informed
    decisions*, not certifications. Certifications require an audit
    we are not the right party to perform.
- "Independently verified" beyond what is true.
  - The EcoLogits methodology is published; the IEA data is
    published. Our application of them is auditable from the source
    code. There is no third-party certification of the per-request
    numbers themselves.

## How to read the numbers

- For ranking: comparing requests within a single
  `methodology_version` and `accuracy` band is meaningful.
- For absolute claims: take the band into account. A `gross` number
  is right within a factor of two; quoting it to three significant
  figures is a category error.
- For reporting: the numbers are appropriate for an internal
  dashboard or a "best-effort estimate" line in a sustainability
  report. They are not appropriate as the basis for a public
  emissions disclosure without acknowledging the methodology and its
  uncertainty.

## Reproducibility

Every estimate can be reproduced from public inputs:

1. Get `resolved_model`, `region`, `total_tokens`, and
   `methodology_version` from the generation record.
2. Look up the model's active parameter count in the
   [EcoLogits registry](https://ecologits.ai) for that version.
3. Apply the formula on the [methodology page](methodology) with the
   coefficients of that version.
4. Look up the grid intensity for that region in the IEA dataset
   referenced by the version.
5. The result should match the stored `eco.carbon_g` to within
   floating-point rounding.

If your reproduction diverges, that's a bug — please
[file an issue](https://github.com/carbonifer/lowrouter/issues).


---

# Reduce your footprint


# Reduce your footprint

The methodology gives you a number. This page is what to actually do
about it. Each section is a lever, with the order of magnitude of its
effect, and the trade-off it carries.

## Pick a smaller model when you can

The largest single lever. Energy scales roughly linearly with active
parameters (see the [methodology](methodology) formula). A 7B-active
model is ~10× lower energy per token than a 70B model.

When to use a smaller model:

- Classification, extraction, and structured-output tasks.
- High-volume background work (summarisation, tagging).
- Anything where the output is verified by a downstream system.

When **not** to:

- Tasks that the smaller model fails at and your application has to
  retry on a larger one anyway. Two failed cheap calls + one big call
  > one big call.

The pseudo-model `lowrouter/auto-cheap` biases toward the smallest
model that can plausibly handle the request. Try it on your traffic;
if quality holds, keep it.

## Cache prompts where the upstream supports it

Several providers offer prompt caching: a long system prompt sent
repeatedly with different user messages is charged at a discount on
the cached portion. Where supported, this cuts both cost and energy
on the cached part.

Practical:

- Place stable instructions, examples, and reference material **first**
  in the messages array.
- Place the variable part (the user's question) **last**.
- Keep the stable prefix above the upstream's caching threshold (for
  example, ≥1024 tokens).

The dashboard's per-transaction view shows `cached_tokens` when an
upstream applied a cache hit.

## Trim prompts

Energy scales linearly with `total_tokens`. A 50% prompt-length
reduction is a 50% energy reduction for the prompt portion.

- Drop preamble that doesn't change the model's behaviour.
- Drop few-shot examples that the model no longer needs.
- Compress reference material (use IDs instead of full descriptions
  when the model has been trained on them).

This compounds with prompt caching: a shorter cached prefix is
cheaper *and* faster to cache.

## Choose the cleaner region when residency permits

The grid intensity in `eu-north` (mostly hydro/nuclear) is roughly
**5–8× lower** than in coal-heavy regions. If your data residency
allows EU-North, you can pick it explicitly:

```json
{
  "model": "lowrouter/auto",
  "messages": [...],
  "route": {"region": "eu-north", "prefer_low_carbon": true}
}
```

Or, if you'd rather let the router pick whichever region is cleanest
*and* available right now, just set `prefer_low_carbon: true` and
leave `region` unset.

## Bound completion length

`max_tokens` lets you stop generation when "enough is enough". For
classification or extraction, set it to the actual answer length plus
a small margin. The carbon savings are linear with the saved tokens.

Some prompts respond well to "Answer in one sentence." instructions;
others ignore them. Both are worth trying — the first time you check
the dashboard, you'll see if the average completion length actually
came down.

## Rate-limit your retries

A retry storm can multiply your footprint by 3–10× of the underlying
call. Use exponential backoff with jitter on retries, cap the retry
count, and **never** retry on a 4xx that is not a 408 timeout.

## Memoise

If the same user asks the same question twice, an in-application
cache returns the previous answer at zero gateway cost. This is the
cheapest watt: the one not spent.

A few patterns that work:

- Hash the prompt (after normalisation) and cache the response by
  that hash.
- Cache lookup tables generated by the model (taxonomies, slot
  schemas) and refresh them on a schedule, not per-request.
- For chat, cache the last few responses in memory keyed by the full
  conversation; reuse when the user re-asks immediately.

## Aggregate where you can

Many small completions cost more than one larger one with multiple
items. Examples:

- Classify a batch of 20 items in a single request rather than 20
  requests.
- Extract structured fields for a list of inputs in one structured-
  output call.

Watch out for context-window limits and for the cost of re-prompting
when one item in the batch fails — sometimes individual calls are
cheaper net.

## Order of magnitude summary

| Lever | Typical reduction |
|-------|-------------------|
| Smaller model | 5–10× per request |
| Region pinning to clean grid | 3–8× on the carbon term |
| Prompt caching | 30–80% on the cached portion |
| Prompt trimming | linear with the % trimmed |
| Memoisation of repeats | 100% on the cached call |
| `max_tokens` bounding | linear with completion-tokens saved |
| Aggregation / batching | 2–5× on overhead |

These are independent — applying several stacks. The first two are
where most teams start.

## Confirm with the dashboard

After any of these changes, check the eco-impact widget on the
dashboard for the same time window before/after. If the change you
made should have reduced the per-1K-tokens carbon number and didn't,
something is off — the dashboard's transaction-detail page tells you
which model and provider actually served each request.


---

# FAQ


# Frequently Asked Questions

Answers to the questions developers and operators ask most often. Each
section is self-contained — link directly to the slug in your own
docs if it's useful.

## Is LowRouter compatible with the OpenAI SDK?

Yes. Set the `base_url` (Python) or `baseURL` (TypeScript) to
`https://lowrouter.ai/api/v1` and use your LowRouter API key. The
SDK calls work unchanged. Detailed examples are on the
[OpenAI SDK Python](integrations/openai-sdk-python) and
[TypeScript](integrations/openai-sdk-typescript) pages.

## Is LowRouter compatible with the Anthropic SDK?

Yes — point the SDK at `https://lowrouter.ai/api/v1/anthropic` and
use a LowRouter API key. See [Anthropic SDK](integrations/anthropic-sdk).

## How is the CO₂ estimate calculated?

`energy = ((α × P_active) + β) × tokens`, then `carbon = energy × grid
intensity`. The full formula, coefficients, data sources, and
confidence bands are on the
[methodology page](sustainable-ai/methodology). The formula comes
from EcoLogits v0.4; the grid intensities come from the IEA.

## Are the eco numbers real-time?

No. The grid intensity is an annual regional average. Real-time
carbon-aware routing is a roadmap feature, not a current one. See
[limits](sustainable-ai/limits).

## What happens if a provider is down?

The auto-router marks that provider's slot temporarily unavailable
and dispatches to the next eligible upstream **in the same region**.
It does not fail across regions silently — if the only eligible
upstream was the one that's down and you've pinned a region, the
request returns 503. See [routing](models/routing).

## How is pricing structured?

Pre-paid credits. The cost of a request is `upstream price ×
tokens + platform fee × tokens`. Both components are visible per
model on the [model browser](/models) and per request on the
dashboard. Failed requests that produced no upstream charge cost
zero credits. Full details: [credits and billing](guides/credits-and-billing).

## Do you store my prompts?

No. We log token counts, model, provider, region, latency, and the
eco estimate per request. We do not store prompt or response content.
The full per-request schema is in
[usage accounting](guides/usage-accounting).

## How do I keep requests in the EU?

Two options:

- **Per-request**: send `"route": {"region": "eu-west"}` in the body.
- **Per-key**: pin the key to `eu-west` in **Dashboard → Keys**.

The per-key option survives client misconfiguration, so prefer it for
production. Details on the [routing](models/routing) and
[API key management](guides/api-keys) pages.

## What's the rate limit?

Three layers: per-key (the daily/monthly credit limits you set),
per-account (default 600 RPM, 64 concurrent), and per-IP (auth and
anonymous). Higher quotas are available on request. Full picture:
[rate limits](api-reference/rate-limits).

## Can I use LowRouter from the browser?

Not with the API key directly — that exposes the key to every page
visitor. Mint short-lived tokens server-side and proxy requests
through your backend. The pattern is the same as for OpenAI; both
SDKs warn against `dangerouslyAllowBrowser`. See
[OpenAI SDK TypeScript](integrations/openai-sdk-typescript).

## Are there free credits?

No. Trying things out costs the same as production usage. See
[philosophy / principles in practice](philosophy/principles-in-practice)
for why.

## Can I get an invoice with my company details?

Yes. Set the legal name, billing address, and VAT number under
**Dashboard → Settings → Billing**. Invoices issued from that point
onwards carry the company details. Past invoices can be re-issued
via support. See [credits and billing](guides/credits-and-billing).

## How do I look up a request after the fact?

Every response carries a `lowrouter.generation_id`. Pass it to
`GET /api/v1/generation/{id}` to get the full record, or open it on
the dashboard. The full record includes the resolved model, the
provider, the region, the eco numbers, and the routing trace. See
[per-request metadata](models/per-request-metadata) and
[discovery endpoints](api-reference/models-and-providers).

## How do I rotate a leaked key?

Create a new key, deploy it everywhere, then delete the old one. The
deletion takes effect on the next request — no caching delay. Full
guidance: [API key management](guides/api-keys).

## Does LowRouter charge a free trial?

We do not run a free trial. Top up the smallest amount that makes
sense for an evaluation; remaining credit can be refunded within 14
days under EU consumer law (see
[credits and billing](guides/credits-and-billing)).

## Why don't I see an eco estimate on some requests?

When the resolved model's parameter count is unknown or unverified,
we omit the `eco` field rather than fabricate a number. The
[methodology](sustainable-ai/methodology) page explains the
confidence bands; the
[limits](sustainable-ai/limits) page covers the cases where eco is
deliberately absent (embedding requests, agent steps with no
tokens, mid-stream upstream errors).

## Is there an SLA?

Production accounts have a posted SLA on the dashboard footer. The
default account does not — best-effort.

## Can I get a Data Processing Agreement?

Yes. Contact us via the email on the [legal page](/impressum).

## Is the source code open?

The platform's repository is at
[github.com/carbonifer/lowrouter](https://github.com/carbonifer/lowrouter).
Public components are licensed under the terms in the repository.

## I found a bug — where do I report it?

[github.com/carbonifer/lowrouter/issues](https://github.com/carbonifer/lowrouter/issues),
ideally with a request ID from the `X-Request-ID` header on a
representative request. See [errors](api-reference/errors).


---


# LowRouter API Reference

**LowRouter API** (v1.0.0)

OpenRouter-compatible API gateway for sustainable AI inference.
Routes LLM requests to the most carbon-efficient provider while maintaining
full compatibility with OpenAI and OpenRouter client libraries.

## Servers

- `/api/v1` — API v1 endpoint

## Endpoints

### POST /chat/completions

Create chat completion

Creates a chat completion with automatic routing to the most carbon-efficient provider.
Fully compatible with OpenAI's chat completions API.

**Operation ID**: `createChatCompletion`  ·  **Tags**: Completions

### POST /completions

Create text completion

Creates a text completion (legacy endpoint for compatibility).
Routes to providers supporting text completion format.

**Operation ID**: `createCompletion`  ·  **Tags**: Completions

### POST /embeddings

Create embeddings

Creates an embedding vector representing the input text.
Routes to providers supporting embeddings via Bifrost.
Applies billing (input tokens only) and carbon tracking.

**Operation ID**: `createEmbedding`  ·  **Tags**: Embeddings

### GET /generation/{generation_id}

Get generation statistics

Retrieves detailed statistics for a specific generation including
tokens, cost, carbon metrics, and latency.
(NICE TO HAVE - may not be implemented in MVP)

**Operation ID**: `getGeneration`  ·  **Tags**: Generations

### GET /metrics/{generation_id}

Get generation metrics

Retrieves carbon and energy metrics for a specific generation.
This endpoint provides historical access to energy consumption and
carbon emissions data for completed requests.

**Operation ID**: `getGenerationMetrics`  ·  **Tags**: Metrics

### GET /models

List available models

Returns a list of all available models with their capabilities,
pricing, and carbon intensity metrics.

**Operation ID**: `listModels`  ·  **Tags**: Models

### GET /models/{model}

Retrieve a model

Returns details for a single model matching the OpenAI retrieve model format.
The model parameter may contain slashes (e.g. nebius/NousResearch/Hermes-4-70B).

**Operation ID**: `getModel`  ·  **Tags**: Models

### GET /providers

List available providers

Returns a list of all configured providers with their status and regions.
(NICE TO HAVE - may not be implemented in MVP)

**Operation ID**: `listProviders`  ·  **Tags**: Providers