# LowRouter — Full Documentation This file concatenates every page of LowRouter's public documentation. Source: /docs. --- # Philosophy # Philosophy LowRouter exists because running an application on top of large language models forces choices that are usually invisible: which provider serves the request, where their hardware sits, what the request actually costs in energy and carbon, and what happens when one provider goes down. These four pages set out the position we take on those choices. They are not marketing copy. They describe what we measure, what we choose not to measure, and why those decisions sometimes lead us to slower, narrower, or more expensive defaults than the rest of the market. - [Why LowRouter exists](why-lowrouter) - [Sustainability-first](sustainability-first) - [Sovereignty and transparency](sovereignty-and-transparency) - [Principles in practice](principles-in-practice) --- # Why LowRouter exists # Why LowRouter exists LLM inference is now a default building block. Most teams that ship features on top of it end up writing the same gateway twice: once to abstract the provider, again to track usage and bills. That gateway is load-bearing — it sees every prompt and every response — but it is rarely treated as a product. It is glue. LowRouter is that gateway as a product, with two opinions baked in. ## Opinion one: the footprint of a request is part of its cost Most billing dashboards show tokens and dollars. LowRouter also reports the energy a request consumed and the grams of CO₂e the inference is estimated to have produced. Both numbers are estimates — see [methodology](../sustainable-ai/methodology) for the formula and its limits — but having them visible changes how the request is thought about. A request with a known carbon number is a request a developer can actually choose differently. We do not claim that every request is "green" or that the estimate is exact. We claim it exists, that the formula is documented, and that the inputs are auditable. ## Opinion two: routing should be explicit and sovereign When you pick a model in most gateways, you pick a *brand*. The brand hides who actually serves the tokens — which provider, which region, which hardware tier. That hiding is convenient until something matters: a region requires data residency, a provider has an outage, a contract requires a specific operator. LowRouter exposes the route. Every response says which provider served the request and from which region, and the dashboard lets operators choose policies (prefer-region, prefer-low-carbon, prefer-cheapest, fixed-provider) that map to those constraints. The default is `lowrouter/auto`; the override is always one field away. ## What LowRouter is not It is not an inference engine. The actual work happens at OpenAI, Anthropic, Mistral, and other providers. We forward, we measure, we account. It is not a benchmarking tool. The dashboard does not rank models on quality. We expose what we can measure faithfully — usage, latency, energy, carbon — and leave subjective judgements to you. It is not a free service. The credits model is documented in [credits and billing](../guides/credits-and-billing). When the costs of running this kind of infrastructure are made invisible, the sustainability story becomes hollow; we'd rather charge what running it actually costs. ## Who it's for - **Developers** who want one endpoint and one bill across multiple providers, plus enough metadata to debug and improve their app. - **Operators** who need data residency, audit trails, and a clear picture of which provider served what. - **Sustainability and compliance teams** who want a defensible number for the AI footprint of their organisation, not a marketing pledge. If that is not you, that is fine. The dashboard and these docs are public for a reason — read what we measure and how, and decide whether the trade-offs fit. --- # Sustainability-first # Sustainability-first "Sustainable AI" is doing a lot of marketing work right now. This page describes what the phrase means inside LowRouter — concretely, what we measure, what we report, and what we have decided not to claim. ## What we measure For every inference request, we estimate two numbers: - **Energy per output token** (Wh), derived from the model's active parameter count using the [EcoLogits methodology](https://ecologits.ai/0.4/methodology/llm_inference/). - **Grid carbon intensity** (gCO₂e/kWh) for the region the provider serves the request from, sourced from the International Energy Agency. The product gives us **gCO₂e per 1,000 tokens** for the request. The exact formula and the confidence we attach to each estimate are in the [methodology page](../sustainable-ai/methodology). These numbers are exposed: - On every API response, in an `eco` block. - On the dashboard, aggregated by day, model, provider, and region. - On the public model browser, as a comparable estimate per model. ## What we don't measure (yet) - **Training emissions.** We report inference only. Training is a separate, larger, and harder-to-attribute footprint, and folding it into per-request numbers is misleading. - **Hardware embodied carbon.** GPU manufacturing has a real footprint; we don't yet have a defensible per-token number for it. - **Real-time grid mix.** We use annual averages by region. Live carbon-aware routing is a future feature, not a current one. - **Embedding workloads.** Encoder-only models have a different compute profile and our formula does not yet model them well. We list these limits because they are part of what an honest number looks like. The sustainable-AI page repeats them next to every chart. ## Constraints that fall out of this position A few platform decisions follow from taking energy and carbon seriously: - **The site itself is on a tight transfer budget.** Pages like this one ship under 100 KB total. The docs site renders server-side with no client framework. Heavy interactive widgets are not added without a reason. - **The default route is auto-mode**, not "the largest model." If a smaller model can do the job for a fraction of the energy, we'd rather it be the default. - **Providers and regions with very high grid intensity are deprioritised** in routing unless explicitly pinned by the caller. See [provider routing](../models/routing). - **We are slower to add features than we'd like.** Every new background job, every dashboard widget, every fancy visualisation is a per-user energy cost; we add them when they earn the cost. ## What this is not A pledge that any single request is green. A claim that the estimate is exact. An assertion that running an LLM through LowRouter is meaningfully better for the planet than running it directly. What it is: a measured number, an open formula, and a default that prefers the smaller model and the cleaner grid when other constraints allow. --- # Sovereignty and transparency # Sovereignty and transparency Two of the practical reasons teams move from a single LLM provider to a gateway are sovereignty (where the request goes) and transparency (what's actually happening). Both deserve concrete answers, not adjectives. ## Sovereignty LowRouter's control plane is operated from the European Union by Carbonifer SAS. The data plane — the providers that actually run the inference — covers multiple regions: EU (Mistral, Anthropic via EU endpoints, Bedrock EU), US (OpenAI, Anthropic via US endpoints, Bedrock US), and a growing set of regional Vertex AI deployments. When you send a request without a region preference, the router picks based on availability and the carbon-intensity heuristic described in [provider routing](../models/routing). The chosen region is reported in the response. When you need data residency: - Pin a region in the request body (`route.region`) or via a virtual key policy (recommended for production). - The route fails closed: if no provider in the requested region is available, the request returns an error rather than silently falling back to another region. We do not run inference on user prompts inside our own infrastructure beyond what is required to forward and account for them. Logs are documented in [usage accounting](../guides/usage-accounting). ## Transparency Every API response includes a `provider` field with the upstream that served it, a `region` field with the region it served from, and a `generation_id` that can be looked up later through [`GET /generation/{id}`](../api-reference/overview). The dashboard shows the same data per request, per day, and aggregated per model. The carbon and energy numbers shown on the dashboard are produced by the formula in [methodology](../sustainable-ai/methodology). The formula's coefficients, the model parameter counts we use, and the grid-intensity values we apply are documented and dated. When a value changes — for instance, when a new IEA dataset replaces an older one — the change is reflected in the dashboard with the date it took effect. The platform is operated by a small team. There is no army of unaccounted-for logging or analytics services. The third-party services involved are listed in the [privacy policy](/privacy). ## How to verify - Read [methodology](../sustainable-ai/methodology) and audit the formula. - Send a request with `lowrouter/auto`, then re-send it pinned to a specific provider/region. Compare `provider`, `region`, and `eco` fields in the responses. - Open the corresponding entries on the dashboard and confirm the numbers match. - Use [`GET /generation/{id}`](../api-reference/overview) to fetch the full record after the fact. The numbers are stable. If something doesn't reconcile, that's a bug, not a feature — please [file an issue](https://github.com/carbonifer/lowrouter/issues). --- # Principles in practice # Principles in practice Principles are easy to declare and harder to live with. This page is the short version of how the previous three translate into product behaviour you can observe. ## Routing | Decision | What we do | |----------|------------| | Default route | `lowrouter/auto` — picks based on availability, latency, and the carbon heuristic. | | Pinning | Any caller can pin model, provider, and region per request, or per virtual key. | | Failover | A provider outage routes to the next eligible option in the same region. We never fail across regions silently. | | Carbon weight | Routing is biased toward lower-carbon regions when other constraints allow. The bias is configurable and the weight is documented in [routing](../models/routing). | ## Pricing | Decision | What we do | |----------|------------| | Pricing model | Pre-paid credits. The price per 1K tokens for each model is shown on the dashboard before you call it. | | Mark-up | A flat platform fee on top of the upstream provider's price. Documented per model. | | Free tier | None. Trying things out costs the same as production usage. | | Refunds | A failed request that produced no upstream charge does not consume credits. | The full pricing rules are in [credits and billing](../guides/credits-and-billing). ## Data handling | Decision | What we do | |----------|------------| | Prompt logging | We log token counts, model, provider, region, and timing. We do not log prompt or response content. | | Retention | Token-level usage is retained for 13 months for billing and auditing. Aggregates are kept longer. | | Export | Usage history can be exported as CSV from the dashboard. | | Subprocessors | Listed in the [privacy policy](/privacy). | If you need a Data Processing Agreement, contact us through the channel listed on the [legal page](/impressum). ## Operations | Decision | What we do | |----------|------------| | Status page | Linked from the dashboard footer. | | Incident communication | Public post-mortems for incidents that affected billing or routing decisions. | | API stability | Breaking changes are versioned (`/api/v1`). Additive changes are documented in the changelog. | ## What "no" looks like - We do not auto-upsell to larger models. The defaults aim at the smallest model that produces an acceptable response. - We do not show comparative quality charts between providers. We are not the right venue to make those judgements; tools that benchmark outputs against reference suites are. - We do not stockpile features for the sake of feature parity. New endpoints and dashboards are added when there is a defensible reason. --- # Getting started # Getting started Four short pages that take you from a clean machine to a working request and a populated dashboard: 1. [Create an account](account) — sign-up, email verification, and the identity model. 2. [Create your first API key](api-keys) — virtual keys, scoping, and safe storage. 3. [Run your first completion](first-completion) — a single `curl` call, what comes back, and the eco metadata. 4. [Tour the dashboard](dashboard-tour) — credit balance, usage, top models, and the eco impact widget. If something doesn't work, the [FAQ](../faq) covers the questions we hear most. For everything else, the support email is on the [legal page](/impressum). --- # Create an account # Create an account LowRouter accounts are individual: one email, one identity, one credit balance. Team accounts and shared workspaces are on the roadmap; until they ship, share access via separate API keys rather than shared credentials. ## Sign up Go to [/register](/register) and fill in: - **Email** — used for verification, sign-in, and billing receipts. - **Password** — a strong, unique one. We do not enforce a maximum length; we do enforce a minimum that follows current OWASP guidance. - **Country** — used to compute the right tax treatment on invoices. You can correct it later from the settings page. Alternatively, sign in with a federated identity provider listed on the register page. Federated sign-in does not change anything about how your data is stored — see [the privacy policy](/privacy) for the full picture. ## Verify your email After registration we send a verification link. The link is valid for 24 hours. Until you verify, you can sign in but you cannot create API keys or top up credits — guarding against typos and disposable-mailbox sign-ups. If the email doesn't arrive, check spam, then use **Resend verification** on the sign-in page. If it still doesn't arrive, contact support — see the email on the [legal page](/impressum). ## Top up credits LowRouter is pre-paid. To send a request you need a non-zero credit balance. 1. Go to **Dashboard → Credits**. 2. Click **Add credits**. 3. Pick an amount and complete the Stripe-hosted checkout. Credits land in your account when Stripe confirms the payment, usually within a few seconds. The amount you top up is exclusive of VAT; the invoice that lands in your inbox afterwards has the VAT breakdown. The full pricing model is documented in [credits and billing](../guides/credits-and-billing). ## What's stored After sign-up the platform stores: - Your email address (sign-in, billing receipts). - A salted hash of your password (never the plaintext). - The country and any billing details you provided. - A unique numeric user ID used internally. We do not store any prompt or response content. Token counts, model IDs, provider IDs, regions, latencies, and the eco numbers are stored per request — see [usage accounting](../guides/usage-accounting) for the full schema. ## Next [Create your first API key →](api-keys) --- # Create your first API key # Create your first API key API keys (also called *virtual keys*) authenticate every request to the gateway. They are bearer tokens — anyone holding the string can spend your credits — so the rest of this page is about creating, scoping, and rotating them safely. ## Create one 1. **Dashboard → Keys**. 2. Click **New key**. 3. Give it a name that describes where it will be used (`prod-server`, `local-dev`, `chatbox-personal`). The name appears in the usage history and helps you find the right key to rotate later. 4. (Optional) Set scoping: - **Models** — restrict to a list of model IDs (`openai/gpt-4o`, `anthropic/claude-sonnet-4-5`). - **Region** — pin requests through this key to a region. - **Daily limit** — cap spend per day to a credit amount. 5. Click **Create**. The full token shows once — copy it now. Tokens look like `lr-sk-...` and are 40+ characters. The dashboard only ever shows the prefix and last four characters again. ## Store it - **Production** — in your secret manager (Vault, AWS Secrets Manager, GCP Secret Manager, sealed Kubernetes secret, …). Never in source control. - **Local development** — in a `.env` file that is in `.gitignore`. - **Personal tools** — in the OS keychain, or in the tool's own encrypted store. Avoid pasting the token into chat applications or notes apps that sync to the cloud. A leaked key can be revoked from the dashboard at any time — see *Rotate or revoke* below — but it can spend credits in the seconds between the leak and the revocation. Treat keys like passwords. ## Use it The header is the standard `Authorization: Bearer`: ```bash curl https://lowrouter.ai/api/v1/chat/completions \ -H "Authorization: Bearer $LOWROUTER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "lowrouter/auto", "messages": [{"role": "user", "content": "Hello"}] }' ``` The `Authorization` header value is exactly `Bearer ` followed by the token — no quotes, no spaces around the equals. SDKs accept the token as the constructor's `apiKey`/`api_key` argument; see [integrations](../integrations/openai-sdk). ## Rotate or revoke - **Rotate** — create a second key, deploy it everywhere, then delete the old one. There is no built-in zero-downtime rotation; the pattern above gives you it without one. - **Revoke** — **Dashboard → Keys → Delete**. The token stops working on the next request, no caching delay. Rotate at least every 90 days, and immediately after any of: - A key was committed to a repository (even briefly). - A key was sent over an insecure channel. - A team member with access to the key left the organisation. - Unexpected usage shows up on the dashboard. ## Next [Run your first completion →](first-completion) --- # Run your first completion # Run your first completion This page sends one chat completion through the gateway, walks through what came back, and points at the things you'll come back to. ## Send the request Set your key in an environment variable so it doesn't end up in shell history: ```bash export LOWROUTER_API_KEY="lr-sk-..." ``` Then call the gateway: ```bash curl https://lowrouter.ai/api/v1/chat/completions \ -H "Authorization: Bearer $LOWROUTER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "lowrouter/auto", "messages": [ {"role": "user", "content": "In one sentence, what is a vector database?"} ] }' ``` `lowrouter/auto` is the default route. You can pick a specific model instead — for example `openai/gpt-4o-mini` or `anthropic/claude-haiku-4-5` — and you can override the route per request with the `route` field. See [routing](../models/routing). ## What comes back The response is OpenAI-shaped. The fields you'll use most are: ```json { "id": "chatcmpl-01J9...", "object": "chat.completion", "created": 1714150000, "model": "openai/gpt-4o-mini", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "A vector database stores …" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 18, "completion_tokens": 26, "total_tokens": 44 }, "lowrouter": { "generation_id": "gen_01J9...", "provider": "openai", "region": "eu-west", "eco": { "energy_wh": 0.0021, "carbon_g": 0.00057, "carbon_per_1k_tokens_g": 0.013, "accuracy": "accurate" } } } ``` The OpenAI-compatible parts (`id`, `choices`, `usage`, …) are documented in [chat completions](../api-reference/chat-completions). The LowRouter-specific block: - **`generation_id`** — opaque ID for this request. Pass it to [`GET /generation/{id}`](../api-reference/overview) to fetch the full record later, or open it on the dashboard with that ID. - **`provider`** — which upstream actually served the request. - **`region`** — the region the upstream served from. - **`eco`** — energy and carbon estimate for this request, with a confidence label. Read [methodology](../sustainable-ai/methodology) before quoting these numbers anywhere. ## Stream the response For interactive UIs, set `"stream": true` and read Server-Sent Events: ```bash curl https://lowrouter.ai/api/v1/chat/completions \ -H "Authorization: Bearer $LOWROUTER_API_KEY" \ -H "Content-Type: application/json" \ -N \ -d '{ "model": "lowrouter/auto", "stream": true, "messages": [{"role": "user", "content": "Count to 5 slowly"}] }' ``` The stream format and how to consume it from common SDKs is on the [streaming page](../api-reference/streaming). ## Things that might surprise you - **The `model` in the response is the resolved model**, not the pseudo-model you sent. If you sent `lowrouter/auto`, the response tells you what was actually picked. - **`usage` reflects upstream tokens**, which may include caching discounts (for providers that support them). The credits charged on your dashboard match this `usage`. - **The `eco` block is missing on requests we couldn't classify**, e.g. a model whose parameter count is unknown. The dashboard shows the same record without an eco number rather than a fabricated one. ## Next [Tour the dashboard →](dashboard-tour) --- # Tour the dashboard # Tour the dashboard The dashboard is at [/dashboard](/dashboard). It's the operator view for your account: balance, usage, eco impact, and shortcuts to the things you'll change most often. ## The landing page When you first land, four widgets are visible above the fold: - **Credit balance** — the current credit balance in your account currency, with a button to top up. The big number is what's left; the small number is what's been spent in the current calendar month. - **Usage today** — total tokens and total cost so far today, broken out by model. Click any model to filter the rest of the page. - **Top models (last 30 days)** — a horizontal bar chart ranking the models you used most by token count. - **Eco impact** — the energy and carbon estimate for the same window, with a comparison to a baseline you choose (see *Eco panel* below). Below the widgets, the **Recent transactions** table shows the last ~50 generations: timestamp, model, provider, region, tokens, cost, and the carbon estimate. Click any row to see the full record on its own page. ## Drill into a transaction The transaction detail page shows the full request record: - The generation ID (you can copy it). - Resolved model, provider, region, latency. - Token counts (prompt, completion, total). - Eco numbers with the methodology version that produced them. - Routing trace: which providers were considered, which one was picked, and why (the cheapest, the lowest-carbon, the closest, …). Prompt and response *content* are not shown — we don't store it. ## Eco panel The eco impact widget compares the last 30 days of your usage against a **baseline**. The baseline is a hypothetical: "what would this same usage have looked like if I'd used model X instead?" It's a back-of-envelope check, not a guarantee. The number it shows is honest about uncertainty: - The value is computed from the same energy formula and grid intensities as every other carbon number on the platform. - The "you saved" framing only appears when the baseline you picked has a higher per-token energy estimate than your actual usage. If your usage was higher-energy than your baseline, the panel says so. - When confidence is low (small samples, models whose parameters are estimated rather than verified), the number is shown with a reduced-confidence indicator and the underlying caveats are linked inline. Pick or change the baseline from **Settings → Eco baseline**. ## Where everything else is - **Keys** — create, scope, rotate, revoke. See [api-keys management](../guides/api-keys). - **Credits** — top up, view receipts. See [credits and billing](../guides/credits-and-billing). - **Invoices** — billing history; same data as the email receipts but downloadable as PDF. - **Auto-routing** — set defaults for `lowrouter/auto`: prefer-region, prefer-low-carbon, fixed-provider, and per-key overrides. See [routing](../models/routing). - **Settings** — profile, eco baseline, notification preferences, account deletion. ## Mobile The dashboard is usable on a phone — header collapses to a hamburger menu, charts switch to a single-column layout. It's not yet built for heavy operations on mobile (large CSV exports, multi-key bulk edits); those flows are still desktop-first. ## Next You're set up. The [User guides](../guides/api-keys) section goes deeper on the day-to-day operations. The [API reference](../api-reference/overview) covers everything you can do over HTTP. --- # Guides # Guides The pages in this section cover the operations you'll come back to after onboarding: - [API key management](api-keys) — create, scope, rotate, revoke, and the policies that virtual keys can carry. - [Credits and billing](credits-and-billing) — how credits work, how pricing is structured, and how invoices are produced. - [Usage accounting](usage-accounting) — the transaction history, exports, and what's recorded per request. - [Dashboard deep-dive](dashboard-deep-dive) — every chart on the dashboard, how to read it, and how to filter it. These build on [Getting started](../getting-started/account); pages there cover the first-time setup, pages here cover the ongoing operation. --- # API key management # API key management [Getting started → API keys](../getting-started/api-keys) walks through creating a key. This page is the operator reference: every option a virtual key can carry, when to use it, and how to retire keys safely. ## Anatomy of a virtual key A virtual key is a token with metadata attached. The token authenticates the request; the metadata controls what that request is allowed to do. The metadata you can attach: | Field | Purpose | |-------|---------| | `name` | Human label, shown in usage history. | | `models` | Allowlist of model IDs. Empty = all models. | | `region` | Pin requests through this key to a region (e.g. `eu-west`). | | `daily_credit_limit` | Max credits this key can spend per UTC day. | | `monthly_credit_limit` | Max credits per UTC calendar month. | | `expires_at` | Optional auto-expiry timestamp. | | `prefer_low_carbon` | When set, biases auto-routing on this key toward lower-grid-intensity providers. | | `enabled` | Toggle without deleting. | All of these can be edited from **Dashboard → Keys → key name → Edit**. Edits take effect on the next request, no caching delay. ## Scoping patterns A few patterns we see often: **One key per environment per service.** The most common shape: `prod-api`, `staging-api`, `local-dev`. Each one is allowlisted to the models that environment actually uses. **One key per third-party integration.** If you give a token to a desktop client (ChatBox, Cline, Claude Code, …), put it in its own key with a daily limit. The blast radius of a leaked key is then "yesterday's daily limit" instead of "everything." **One key per researcher / experiment.** When the same project runs multiple lines of experiments, separate keys make the dashboard's top-models chart instantly readable per experiment. **One key per region requirement.** When a particular workload must stay in `eu-west`, set the region on the key rather than on every request. The constraint travels with the key and a misconfigured client can't accidentally send the request elsewhere. ## Limits and what happens when they're hit When a request would push a key over its `daily_credit_limit` or `monthly_credit_limit`, the gateway returns `429 Too Many Requests` with an explanatory body: ```json { "error": { "type": "rate_limited", "code": "key_daily_limit_exceeded", "message": "API key 'prod-api' has reached its daily credit limit (5.00).", "param": null } } ``` The response includes `Retry-After` indicating when the limit resets (midnight UTC). Bumping a limit takes effect immediately for subsequent requests. ## Rotation The rotation pattern that does not require zero-downtime support from the gateway: 1. **Create** a second key with the same scope. 2. **Deploy** it everywhere the old key was used (config, secret manager, CI variables). 3. **Verify** the new key is in use by watching the usage history; the old key's request rate should drop to zero. 4. **Delete** the old key. Aim to rotate at least every 90 days, and immediately after any of the events listed in [Getting started → API keys](../getting-started/api-keys#rotate-or-revoke). ## Revocation A revoked key returns `401 Unauthorized` on the next request. There is no warning, no grace period, no caching delay. Revocation cannot be undone — if you revoke the wrong key, create a new one. The dashboard preserves the key's usage history after revocation. The token itself is discarded. ## Audit trail Every key creation, edit, rotation, and revocation produces an entry in **Settings → Audit log**. The log records: who acted, when, on which key, and what changed. Export as CSV for retention in your own audit pipeline. --- # Credits and billing # Credits and billing LowRouter is pre-paid. You top up a credit balance, the gateway debits that balance per request, and the balance is your single source of truth for spend. ## Credits A credit is a fractional unit of EUR: 1 credit = €0.01, so €5 of credits adds 500 credits. Credits and balances are always denominated in EUR, and checkout is in EUR. At checkout, a flat payment-processing and platform fee is added to the credit amount — the same for every payment method and region — and VAT is added on top where applicable. The all-in price is shown before you pay; every euro of credit is yours to spend in full once delivered. Credits do not expire. Refunds for accidental top-ups are handled case-by-case via the support email on the [legal page](/impressum) within 14 days. The current balance is shown on the dashboard, on the credits page, and in the response of every request via the `X-Credit-Balance` header. ## What a request costs The cost of a request is: ``` upstream_provider_price_per_token × tokens + platform_fee_per_token × tokens ``` Both components are quoted per 1K tokens, separately for prompt and completion. The prices visible on the model browser and the model pages already include the platform fee. The breakdown is also visible on each transaction's detail page. A few details worth knowing: - **Cached prompt tokens** (when an upstream provider supports prompt caching) are charged at the upstream's cached rate. The platform fee is unchanged. - **Failed requests** that produced no upstream charge consume zero credits. A 4xx from the upstream that did consume tokens (rare) is passed through to your bill. - **Streaming responses** are charged on the same usage numbers as a non-streaming response — total tokens, not per-chunk. ## Top up **Dashboard → Credits → Add credits**, then complete the Stripe-hosted checkout. Card and SEPA Direct Debit (EU accounts) are supported. The amount you select is exclusive of VAT. The invoice produced after payment shows the net amount, the VAT amount, and the gross total. VAT treatment follows the country and (where applicable) VAT number on your billing profile. ## Invoices After every successful top-up, an invoice is generated and emailed. The same invoices are downloadable as PDF from **Dashboard → Invoices**. If you operate on behalf of a company: 1. **Settings → Billing** — set the legal name, billing address, and VAT number. 2. Invoices issued from that point onwards carry the company details. 3. Past invoices can be re-issued with the corrected billing block on request via support. ## Pricing changes Upstream provider prices change. We update the prices on the model browser and in the routing engine within one business day of an upstream price change going live. The dashboard records the per-request price at the moment of the request, so historical bills are stable even when current prices change. The platform fee is published per model on the model browser. Material changes to the platform fee are announced at least 30 days in advance to the email on the account. ## What we don't bill for - Failed authentication, rate-limited requests, or key-limit hits — zero credits. - Health checks (`HEAD /docs`, `HEAD /api/v1/models`, etc.) — zero credits. - Dashboard browsing, key management, or any control-plane action — zero credits. ## Refunds Refunds for unspent credit balances are not processed automatically. Contact support if you need to wind down an account; we'll process the refund of the remaining balance to the original payment method, subject to a 14-day cooling-off limit on the most recent top-up under EU consumer law. ## Tax LowRouter is operated by Carbonifer SAS, a French entity. VAT is charged at the rate applicable to your billing country. EU B2B customers with a valid VAT number are subject to reverse charge (no VAT on the invoice). Non-EU customers receive an invoice without VAT. --- # Usage accounting # Usage accounting Every request through the gateway produces a record. This page is the schema and the access patterns. ## Per-request record The fields stored for each request: | Field | Description | |-------|-------------| | `generation_id` | Opaque, globally unique. Returned in the response and used to look up the record later. | | `created_at` | UTC timestamp when the gateway accepted the request. | | `completed_at` | UTC timestamp when the response was fully sent (post-streaming for streamed requests). | | `key_id` | The virtual key used. Names are joined in for display. | | `requested_model` | The string the caller sent (e.g. `lowrouter/auto`). | | `resolved_model` | The model the router actually picked. | | `provider` | Upstream provider that served the request. | | `region` | Region the upstream served from. | | `prompt_tokens` | Token count of the input. | | `completion_tokens` | Token count of the output. | | `total_tokens` | Sum. | | `latency_ms` | First-byte latency for streaming, end-to-end for non-streaming. | | `cost_credits` | Credits debited for this request. | | `eco.energy_wh` | Estimated energy for the inference, in watt-hours. | | `eco.carbon_g` | Estimated CO₂e for the request, in grams. | | `eco.carbon_per_1k_tokens_g` | Same number, normalised per 1K tokens. | | `eco.accuracy` | `accurate` / `medium` / `gross` confidence band. | | `eco.methodology_version` | Version of the formula and data inputs that produced these numbers. | | `status` | `ok`, `client_error`, `upstream_error`. | | `routing_trace` | Which providers were considered, which one was picked, why. | Prompt and response *content* are not stored. ## Where to read it - **Dashboard → Recent transactions** — the last ~50 requests, with filtering by date range, model, provider, region, key. - **Transaction detail page** — full record, accessed by clicking a row or by visiting `/dashboard/transactions/{generation_id}`. - **API**: [`GET /api/v1/generation/{id}`](../api-reference/overview) returns the same record as JSON. Useful for programmatic queries, reconciliation, or piping into your own data warehouse. - **Export** — **Dashboard → Recent transactions → Export** produces a CSV of the filtered range, capped at 50,000 rows per export. ## Aggregates The dashboard pre-computes a small set of aggregates and updates them on each request: - Tokens per day, per model. - Cost per day, per model. - Energy and carbon per day, per model. These aggregates power the charts. They are derived from the per-request records and are reproducible from a CSV export. ## Retention | Data | Retention | |------|-----------| | Per-request records | 13 months from `created_at`. | | Daily aggregates | 36 months. | | Audit log entries | 36 months. | | Account profile | For the lifetime of the account. | After retention expires the per-request rows are deleted and replaced by anonymised aggregates. Aggregates are kept for sustainability reporting and platform analytics; they cannot be used to reconstruct individual requests. You can request earlier deletion of all your usage records via the support email; the deletion is irreversible and may affect your ability to dispute past invoices. ## Reconciliation tips - The sum of `cost_credits` over a day should equal the daily cost on the dashboard within a fraction of a credit (rounding). - The sum of `total_tokens` over a day grouped by model is what the upstream provider's usage report (if you have one) should show. - The carbon estimate is reproducible: given the same `resolved_model`, `region`, and `total_tokens`, recomputing with the formula in [methodology](../sustainable-ai/methodology) using the `methodology_version` should yield the same gram count. If the numbers diverge more than rounding allows, that is a bug — [open an issue](https://github.com/carbonifer/lowrouter/issues). --- # Dashboard deep-dive # Dashboard deep-dive [Dashboard tour](../getting-started/dashboard-tour) is the lap-around. This page is the per-chart reference: what each one shows, what it's derived from, and the gotchas. ## Filters A date-range picker, a model picker, and a key picker sit at the top of the dashboard. Setting any filter reloads every widget on the page with the same filter applied. Filters are reflected in the URL so links are shareable. The default range is the last 30 days, ending today (UTC). Changing the range: - Updates the **Top models** ranking. - Updates the **Eco impact** comparison. - Restricts the recent-transactions table. It does **not** change your credit balance, which is always live. ## Credit balance widget - **Big number**: current balance. - **Small number**: spent in the current calendar month. - The colour of the band underneath is a heuristic: green if your current burn-rate would last the full calendar month, amber if not. The widget pulls live; clicking **Top up** opens the Stripe checkout. ## Usage today widget - **Top number**: total tokens today (UTC), all models. - **Bottom number**: total credits spent today. - **Bars**: per-model breakdown of today's tokens. Hover any bar for the per-model token and credit total. Click a bar to filter the rest of the dashboard by that model. ## Top models (last 30 days) - Horizontal bar chart, ranked by total tokens descending. - Each bar shows tokens; the value next to the bar shows credits. - The 30-day window is fixed regardless of the page-level filter so the ranking is stable across page loads. When fewer than five models have non-trivial usage, the chart shows just the ones with data rather than padding with empty bars. ## Eco impact widget The widget has three numbers and one comparison: - **Energy** — total Wh estimated for the filtered range. - **Carbon** — total gCO₂e estimated for the filtered range. - **Per 1K tokens** — the normalised value, useful for comparing across ranges of different sizes. - **Comparison** — the same usage replayed against your chosen baseline model. If your actual usage was lower-energy than the baseline, the widget reports the saving; if higher, it reports the gap. Pick or change the baseline from [Settings → Eco baseline](/dashboard/settings). The widget refuses to display a comparison when the underlying estimates' confidence is too low to be meaningful — the visible number becomes "—" with a note explaining why. ## Recent transactions table - Default sort: newest first. - Sortable columns: timestamp, model, tokens, cost, latency. - Filterable columns: model, provider, region, status. - Clicking a row opens its detail page. ## Per-day usage chart Below the recent-transactions table, a stacked column chart shows tokens per day, stacked by model, for the filtered range. This is the chart to use when explaining usage growth or detecting a spike. The chart respects the page-level model filter — selecting a model up top filters the chart to that one. ## Per-day cost chart Same shape as the per-day usage chart but in credits. Use it for budgeting and burn-rate analysis. The two charts share the same time axis so they can be compared visually. ## Provider distribution A donut showing the share of requests served by each upstream provider in the filtered range. It's the fastest way to confirm a routing-policy change actually took effect. When a policy is supposed to keep traffic in a single provider but the donut shows multiple wedges, check the policy and the per-request `routing_trace` for the requests that escaped. ## Mobile dashboard On a phone the dashboard collapses to a single column, the filters move into a slide-up panel, and the per-day charts become scrollable. The recent-transactions table becomes a vertical card list. Heavy operations (large CSV exports, multi-key edits) still want the desktop view. --- # Pricing and currency conversion # Pricing and currency conversion Most providers we route to publish their per-token prices in USD rather than EUR. To keep accounting and balances simple, **every balance and every charge is in EUR**, and we convert non-EUR provider prices to EUR once a day before they reach the catalogue. This page explains how that conversion works, why the displayed price on a USD-billed provider isn't quite the same as the headline USD figure, and what happens when our FX feed is unavailable. ## How the conversion is computed For each non-EUR provider price we ingest, the stored EUR value is: ``` stored_eur_per_1m_tokens = source_per_1m_tokens × max(source_to_eur_rate, 1.0) × (1 + fx_buffer_percent / 100) ``` - **`source_per_1m_tokens`** — the price the provider publishes in their billing currency (USD for most providers). - **`source_to_eur_rate`** — the daily rate that converts the provider's currency into EUR, derived from the reference rates published by the European Central Bank at the [eurofxref-daily.xml](https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml) endpoint. A **rate floor of `1.0`** is applied: when the EUR is stronger than the source currency, the floor pins the conversion at parity so that a strong EUR can't quietly erode our markup. - **`fx_buffer_percent`** — a fixed conversion markup applied on top of the ECB rate, defaulting to **3 %**. The markup covers FX-spread drift between the day we fetched the rate and the day we settle with the provider, plus payment-rail conversion fees. For example, a provider publishing `$0.15 per 1M tokens` at an ECB reference rate of `1 EUR = 0.90 USD` (so `1 USD = 1.111 EUR`) and a 3 % markup becomes: ``` 0.15 × 1.111 × 1.03 ≈ 0.1717 EUR per 1M tokens ``` When the EUR is instead stronger than the dollar — say `1 EUR = 1.085 USD`, so `1 USD = 0.92 EUR`, which is below the `1.0` floor — the floor kicks in and the conversion uses `1.0`: ``` 0.15 × 1.0 × 1.03 ≈ 0.1545 EUR per 1M tokens ``` That is what your balance is debited for every 1M tokens you spend on that model. ## Where you see the dual figures Providers that already bill in EUR (e.g. `/providers/scaleway`) render a single figure: their published EUR price, shown as-is with no conversion. Providers that bill in USD (most of them — OpenAI, Anthropic, Bedrock, Vertex, Groq, …) render both numbers: ``` $0.15/M (€0.17 billed) ``` The first figure is the upstream price you'd see on the provider's own pricing page. The second is the EUR value your balance is debited at, computed with the formula above. Hovering the price shows the ECB rate, the markup, and when the conversion was last refreshed. ## Refresh cadence The ECB feed updates on TARGET business days at ~16:00 CET. We pull it once per ingest run, which runs daily. Saturday and Sunday reuse Friday's published rate — that's the standard ECB convention. Your debit at request time uses the most recent stored EUR value. **This means the price you see in the catalogue today may differ slightly from what was billed yesterday for the same number of tokens** — by the size of the FX move plus markup drift. ## When the ECB feed is unavailable If we cannot reach ECB during an ingest run, we **soft-disable** the affected providers (the USD-billed ones that need conversion) until the feed recovers. While soft-disabled, those providers are visible in the catalogue but won't accept new requests, and an internal alert (`FX_INGEST_STALE`) notifies the team. We deliberately do not fall back to a hard-coded rate. A guessed rate is worse than visible unavailability — it can either silently under-charge our margin or over-charge customers, and neither is something we want to do quietly. ## What we don't claim - **We don't offer a rate lock.** The price you see today is the price we charge today; tomorrow's rate may differ. If you need a fixed price, the model's EUR figure is the one we honour at the moment of the request, not a quote held in advance. - **We don't pass through every provider price change in real time.** The catalogue reflects the most recent successful ingest, which is daily. A provider price change mid-day will land in the next run. - **We don't bill in any currency other than EUR.** EUR is our single operating currency: balances, charges, the price list, and checkout are all EUR. Providers that publish in USD are converted once a day as described above, which keeps a single ledger and a single set of accounting rules. ## Configuration For self-hosted operators, the conversion markup can be tuned via the `LOWROUTER_FX_BUFFER_PERCENT` environment variable (default `3.0`, clamped to `[0, 20]`). The change takes effect on the next ingest run. --- # Integrations # Integrations LowRouter speaks the OpenAI Chat Completions API. Anything that talks to OpenAI talks to LowRouter — usually by changing the base URL and the API key. The pages in this section are short on purpose. Each one is a working snippet plus the two or three things that surprise people on first use. - [curl](curl) — the bare HTTP request, useful for debugging and as the source of truth. - [OpenAI SDK (Python)](openai-sdk-python) - [OpenAI SDK (TypeScript / JavaScript)](openai-sdk-typescript) - [Anthropic SDK](anthropic-sdk) - [ChatBox](chatbox) — desktop chat client. - [OpenCode](opencode) — terminal coding assistant. - [Cline](cline) — VS Code coding agent. - [Goose](goose) — Block's open-source agent. - [Claude Code](claude-code) — Anthropic's CLI agent. - [Generic OpenAI-compatible clients](openai-compatible) — the pattern for anything not on this list. The base URL for every integration is: ``` https://lowrouter.ai/api/v1 ``` The auth header is: ``` Authorization: Bearer $LOWROUTER_API_KEY ``` That's the whole deal. The rest is per-tool configuration. --- # curl # curl The simplest way to talk to the gateway. If something works in `curl` but not in your SDK, the SDK is the thing to debug. ## A non-streaming completion ```bash curl https://lowrouter.ai/api/v1/chat/completions \ -H "Authorization: Bearer $LOWROUTER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "lowrouter/auto", "messages": [ {"role": "user", "content": "In one sentence, what is a vector database?"} ] }' ``` The response shape is documented in [chat completions](../api-reference/chat-completions). ## A streaming completion Use `-N` to disable curl's output buffering, and set `"stream": true` in the body: ```bash curl -N https://lowrouter.ai/api/v1/chat/completions \ -H "Authorization: Bearer $LOWROUTER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "lowrouter/auto", "stream": true, "messages": [{"role": "user", "content": "Count to 5 slowly"}] }' ``` The response is a Server-Sent Events stream; the format is in [streaming](../api-reference/streaming). ## Listing models ```bash curl https://lowrouter.ai/api/v1/models \ -H "Authorization: Bearer $LOWROUTER_API_KEY" ``` Returns the routable models with their per-token prices and basic metadata. Cache the result locally — it does not change between requests within a single user session. ## Pinning a provider and region Override `lowrouter/auto` by sending an explicit model string and a `route` block: ```bash curl https://lowrouter.ai/api/v1/chat/completions \ -H "Authorization: Bearer $LOWROUTER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}], "route": {"region": "eu-west", "provider": "openai"} }' ``` If no provider in the requested region is available the request returns 503 rather than falling back silently. See [routing](../models/routing) for the full set of route options. ## Looking up a generation later Every response carries a `lowrouter.generation_id`. Pass it to: ```bash curl https://lowrouter.ai/api/v1/generation/$GENERATION_ID \ -H "Authorization: Bearer $LOWROUTER_API_KEY" ``` You get the same record the dashboard shows, including the eco numbers and the routing trace. --- # OpenAI SDK (Python) # OpenAI SDK (Python) The OpenAI SDK is the canonical client. It works with LowRouter unchanged once you set `base_url` and `api_key`. ## Install ```bash pip install openai ``` ## A non-streaming completion ```python import os from openai import OpenAI client = OpenAI( base_url="https://lowrouter.ai/api/v1", api_key=os.environ["LOWROUTER_API_KEY"], ) response = client.chat.completions.create( model="lowrouter/auto", messages=[ {"role": "user", "content": "In one sentence, what is a vector database?"} ], ) print(response.choices[0].message.content) ``` ## A streaming completion ```python stream = client.chat.completions.create( model="lowrouter/auto", messages=[{"role": "user", "content": "Count to 5 slowly"}], stream=True, ) for chunk in stream: delta = chunk.choices[0].delta.content if delta: print(delta, end="", flush=True) ``` ## Reading the eco metadata LowRouter's per-request metadata lives outside the OpenAI schema, so the typed SDK fields don't surface it. Read it from the raw response: ```python response = client.chat.completions.create( model="lowrouter/auto", messages=[{"role": "user", "content": "hi"}], ) extra = response.model_extra or {} eco = extra.get("lowrouter", {}).get("eco") if eco: print(f"{eco['carbon_per_1k_tokens_g']:.3f} gCO2e/1k tokens " f"({eco['accuracy']})") ``` `response.model_extra` is the canonical Pydantic-v2 escape hatch for non-schema fields. On older SDK versions the attribute is `response.__pydantic_extra__`. ## Pinning a region with extra_body The OpenAI SDK passes unknown kwargs through `extra_body`: ```python response = client.chat.completions.create( model="openai/gpt-4o-mini", messages=[{"role": "user", "content": "hi"}], extra_body={"route": {"region": "eu-west"}}, ) ``` If you need this on every request, make it a default by wrapping the client: ```python def make_client(region="eu-west"): base = OpenAI( base_url="https://lowrouter.ai/api/v1", api_key=os.environ["LOWROUTER_API_KEY"], default_headers={"X-LowRouter-Region": region}, ) return base ``` `X-LowRouter-Region` is honoured the same as `route.region` in the body. ## Async The async client follows the same pattern: ```python import asyncio from openai import AsyncOpenAI client = AsyncOpenAI( base_url="https://lowrouter.ai/api/v1", api_key=os.environ["LOWROUTER_API_KEY"], ) async def main(): r = await client.chat.completions.create( model="lowrouter/auto", messages=[{"role": "user", "content": "hi"}], ) print(r.choices[0].message.content) asyncio.run(main()) ``` --- # OpenAI SDK (TypeScript) # OpenAI SDK (TypeScript) ## Install ```bash npm install openai ``` ## A non-streaming completion ```ts import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://lowrouter.ai/api/v1", apiKey: process.env.LOWROUTER_API_KEY, }); const response = await client.chat.completions.create({ model: "lowrouter/auto", messages: [ { role: "user", content: "In one sentence, what is a vector database?" }, ], }); console.log(response.choices[0].message.content); ``` ## A streaming completion ```ts const stream = await client.chat.completions.create({ model: "lowrouter/auto", stream: true, messages: [{ role: "user", content: "Count to 5 slowly" }], }); for await (const chunk of stream) { const delta = chunk.choices[0]?.delta?.content; if (delta) process.stdout.write(delta); } ``` ## Reading the eco metadata The TypeScript types do not include LowRouter's extra fields. Cast or narrow when you read them: ```ts type LowRouterMeta = { generation_id: string; provider: string; region: string; eco?: { energy_wh: number; carbon_g: number; carbon_per_1k_tokens_g: number; accuracy: "accurate" | "medium" | "gross"; }; }; const r = await client.chat.completions.create({ /* ... */ }); const meta = (r as unknown as { lowrouter?: LowRouterMeta }).lowrouter; if (meta?.eco) { console.log( `${meta.eco.carbon_per_1k_tokens_g.toFixed(3)} gCO2e/1k (${meta.eco.accuracy})`, ); } ``` ## Pinning a region The SDK forwards unknown fields: ```ts const response = await client.chat.completions.create({ model: "openai/gpt-4o-mini", messages: [{ role: "user", content: "hi" }], // @ts-expect-error: extra fields not in the OpenAI schema route: { region: "eu-west" }, }); ``` Or, if you prefer not to silence the type error, send the field as a header: ```ts const client = new OpenAI({ baseURL: "https://lowrouter.ai/api/v1", apiKey: process.env.LOWROUTER_API_KEY, defaultHeaders: { "X-LowRouter-Region": "eu-west" }, }); ``` ## Browser usage The OpenAI SDK warns against running with an API key in the browser because the key is then exposed to every page visitor. The same applies to LowRouter: keep your `LOWROUTER_API_KEY` server-side and proxy requests from a backend you control. If you need a signed, short-lived token for a browser client, server-side endpoint that mints one is the right shape. --- # Anthropic SDK # Anthropic SDK The gateway accepts Anthropic-shaped requests on a separate path so the official `anthropic` SDK works without a custom adapter. Use this when your codebase is already standardised on Anthropic's `messages.create()` style. ## Python ```bash pip install anthropic ``` ```python import os from anthropic import Anthropic client = Anthropic( base_url="https://lowrouter.ai/api/v1/anthropic", api_key=os.environ["LOWROUTER_API_KEY"], ) message = client.messages.create( model="anthropic/claude-sonnet-4-5", max_tokens=512, messages=[ {"role": "user", "content": "In one sentence, what is a vector database?"} ], ) print(message.content[0].text) ``` ## TypeScript ```bash npm install @anthropic-ai/sdk ``` ```ts import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic({ baseURL: "https://lowrouter.ai/api/v1/anthropic", apiKey: process.env.LOWROUTER_API_KEY, }); const message = await client.messages.create({ model: "anthropic/claude-sonnet-4-5", max_tokens: 512, messages: [{ role: "user", content: "What's a vector database?" }], }); console.log(message.content); ``` ## Notes - The model string is the LowRouter ID (`anthropic/claude-sonnet-4-5`), not Anthropic's bare model name. - `api_key` is your LowRouter key, not an Anthropic API key. - The `anthropic-version` header is set automatically by the SDK; the gateway accepts what the SDK sends. - Streaming works as in the official SDK (`client.messages.stream(...)`). - Eco metadata is appended to the response in a `lowrouter` field at the top level, the same shape as in the OpenAI-compatible path. ## When to prefer this over the OpenAI path - Your codebase already uses Anthropic types end-to-end and you don't want to rewrite call-sites. - You depend on Anthropic-specific request features (system prompt caching, citations, computer-use tool blocks) that aren't in the OpenAI schema. - You want to keep tool definitions in Anthropic's `tools` shape. If neither of these applies, the OpenAI-compatible path is simpler: one set of types, one base URL, every model on the same endpoint. --- # ChatBox # ChatBox [ChatBox](https://chatboxai.app/) is an open-source desktop chat client. Configure it to use LowRouter as a custom OpenAI provider. ## Configure 1. Open **Settings → Model Provider**. 2. Click **Add custom provider**. 3. Fill in: - **Name**: `LowRouter` - **API Mode**: `OpenAI API Compatible` - **API Host**: `https://lowrouter.ai/api/v1` - **API Path**: `/chat/completions` - **API Key**: your `lr-sk-...` token 4. Save. ## Pick a model Under **Model**, choose **Custom model name** and enter a LowRouter model ID — `lowrouter/auto`, `openai/gpt-4o`, `anthropic/claude-sonnet-4-5`, etc. The full list is on the [model browser](/models). ## Recommended setup - **Use a key dedicated to ChatBox.** Keep its `daily_credit_limit` small (e.g. 1 credit/day for a personal account). A leaked key bounded to "yesterday's daily limit" is much less painful than a leaked production key. - **Disable telemetry on the ChatBox side** if you care about not exposing prompt content to the ChatBox publisher's analytics. ChatBox itself does not see prompts in normal operation, but features like crash reporting can capture context. - **Stream replies on**. The desktop UX expects streaming and LowRouter supports it the same way OpenAI does. ## Troubleshooting - **401 / Unauthorized**: confirm the API key starts with `lr-sk-` and has not been revoked. Test with `curl` from [the curl page](curl) using the same key. - **404 / Not Found**: the **API Path** must be `/chat/completions`. Some ChatBox versions default to `/v1/chat/completions`, which becomes `https://lowrouter.ai/api/v1/v1/chat/completions` — drop the leading `/v1/`. - **Model not available**: look up the exact model ID on the [model browser](/models). Auto-complete in ChatBox is not always accurate. --- # OpenCode # OpenCode [OpenCode](https://opencode.ai/) is a terminal coding assistant. It expects an OpenAI-compatible endpoint, which is what LowRouter exposes. ## Configure OpenCode reads its config from `~/.config/opencode/opencode.json`. Point the OpenAI provider entry at LowRouter: ```json { "providers": { "openai": { "baseURL": "https://lowrouter.ai/api/v1", "apiKey": "lr-sk-..." } }, "defaultModel": "lowrouter/auto" } ``` Restart OpenCode after editing the file. ## Picking a model Inside OpenCode, run `/model` and pick from the list. If a model isn't listed, type the LowRouter model ID directly — any model on the [model browser](/models) is routable. For coding tasks, the auto route (`lowrouter/auto`) generally picks something appropriate. If you want a specific model: - For long contexts: a model with ≥128K context window. The model browser tags context length per model. - For latency-sensitive iteration: an `*-mini` or `*-haiku-*` variant. - For careful reasoning: a top-tier reasoning model. ## Recommended setup - **Dedicated key with a daily limit.** OpenCode is interactive and it's easy to lose track of how many tokens you spent in an afternoon. A daily limit on the key bounds the surprise. - **Disable shell-execution tools by default.** OpenCode supports letting the model run shell commands; turn that off until you've reviewed the prompts the agent sends. Enable it per-session for the workflow that needs it. - **Stream on.** Default in OpenCode; mentioned for completeness. ## Troubleshooting - **Hangs on the first request**: confirm `baseURL` ends with `/v1` (no trailing slash). OpenCode appends `/chat/completions` itself. - **Model "not found"**: the model isn't in OpenCode's autocomplete list, but it is routable. Run `/model lowrouter/auto` to confirm the gateway is reachable, then use the explicit model ID. --- # Cline # Cline [Cline](https://cline.bot/) is a VS Code extension that runs a coding agent against an LLM provider. It supports any OpenAI-compatible endpoint. ## Configure 1. Install the **Cline** extension from the VS Code marketplace. 2. Open the Cline sidebar. 3. Click the gear icon, then **Settings**. 4. Under **API Provider**, pick **OpenAI Compatible**. 5. Fill in: - **Base URL**: `https://lowrouter.ai/api/v1` - **API Key**: your `lr-sk-...` token - **Model ID**: any LowRouter model ID, e.g. `anthropic/claude-sonnet-4-5` or `lowrouter/auto`. 6. Save and start a task. ## Recommended setup - **Separate key for Cline**. Cline can burn tokens fast on agentic tasks (read file → think → edit → re-read). A dedicated key with a per-day limit is a cheap insurance policy. - **Pick an explicit model**, not `lowrouter/auto`, for repeatable pricing. Auto-routing changes the underlying model based on availability, which can surprise you when you compare daily costs. - **Read the diff every time.** Cline produces real edits to your workspace. The dashboard's transaction-detail page shows exactly what was sent (token counts, model, cost) but not the prompt or the response — the source of truth is the diff in your editor. ## Troubleshooting - **"Model does not support tools"**: not every model exposes tool use. The model browser tags `tool_use: true` on supported models. Pick one that does, or switch to `lowrouter/auto` which prefers tool-capable models when the prompt calls for tools. - **"Context window exceeded"**: the file or selection you fed the agent is larger than the model's context. Switch to a longer-context model or trim the context. - **401**: confirm the API key. - **Latency feels off**: check the **provider** field on the transaction detail page. Cline doesn't expose it; LowRouter does. --- # Goose # Goose [Goose](https://block.github.io/goose/) is Block's open-source agent. It supports OpenAI-compatible providers via configuration. ## Configure Edit `~/.config/goose/config.yaml`: ```yaml GOOSE_PROVIDER: openai OPENAI_HOST: https://lowrouter.ai/api/v1 OPENAI_API_KEY: lr-sk-... GOOSE_MODEL: lowrouter/auto ``` Or set them as environment variables before starting Goose: ```bash export GOOSE_PROVIDER=openai export OPENAI_HOST=https://lowrouter.ai/api/v1 export OPENAI_API_KEY=lr-sk-... export GOOSE_MODEL=lowrouter/auto goose session ``` ## Picking a model Set `GOOSE_MODEL` to any LowRouter model ID. For agentic tasks (file reading, shell tools, multi-step reasoning), pick a model tagged with `tool_use: true` on the [model browser](/models). ## Recommended setup - **Dedicated key, daily limit.** Same reasoning as the other agents: agentic loops can run away. - **Limit the toolset Goose has access to.** Goose's `extensions` config lets you allow only the tools the workflow needs. Fewer enabled tools = fewer surprises. - **Set a step limit.** Goose has a max-step setting; cap it at a small number for unattended runs. ## Troubleshooting - **Goose immediately exits with a config error**: `OPENAI_HOST` does not include a trailing slash. Match the value above exactly. - **Tool calls fail silently**: verify the chosen model actually supports tool use (model browser, `tool_use: true`). Some smaller models don't. --- # Claude Code # Claude Code [Claude Code](https://www.anthropic.com/claude-code) is Anthropic's CLI agent. It expects an Anthropic API endpoint, which LowRouter exposes via the `/api/v1/anthropic` prefix. ## Configure Set environment variables before starting Claude Code: ```bash export ANTHROPIC_BASE_URL=https://lowrouter.ai/api/v1/anthropic export ANTHROPIC_API_KEY=lr-sk-... claude ``` Or persist them in your shell profile (`~/.zshrc`, `~/.bashrc`). ## Picking a model Claude Code uses the model defined in its settings. Edit `~/.claude/settings.json` to pin a LowRouter-routed Claude model: ```json { "model": "anthropic/claude-sonnet-4-5" } ``` The model string is the LowRouter ID (`anthropic/claude-sonnet-4-5`), not Anthropic's short name. The full list of supported Claude models is on the [model browser](/models) — filter by provider `anthropic`. ## Recommended setup - **Dedicated key.** Same daily-limit advice as the other agents. - **EU residency**: pin the key's region to `eu-west` so Claude requests are served from EU endpoints. This survives IDE restarts and machine swaps without per-session config. - **Audit usage on the dashboard.** The transaction detail page shows token counts and cost per generation, which Claude Code itself does not surface in real time. ## Troubleshooting - **Claude Code can't reach the API**: the base URL must end with `/anthropic` (no trailing slash). Claude Code appends `/v1/messages` internally. - **Model "not found"**: confirm the model exists on the [model browser](/models). Anthropic adds and deprecates models faster than the model browser updates; a 404 usually means the model is no longer routable. - **The agent stalls during streaming**: this is sometimes a network-buffering issue between Claude Code and LowRouter. Setting `CLAUDE_CODE_STREAM_BUFFER=0` (if the version you run supports it) disables the client-side buffer. --- # Generic OpenAI-compatible clients # Generic OpenAI-compatible clients If a tool isn't on this list, it almost certainly works as long as it exposes two settings: **base URL** and **API key**. The pattern below is what to fill in. ## Settings to set | Setting | Value | |---------|-------| | Provider type | OpenAI Compatible (sometimes "Custom OpenAI" or "OpenAI API") | | Base URL | `https://lowrouter.ai/api/v1` | | API Key | your `lr-sk-...` token | | Path / endpoint | `/chat/completions` (most tools handle this automatically) | | Model | any LowRouter model ID — `lowrouter/auto`, `openai/gpt-4o-mini`, … | ## What does *not* work - **Tools that hard-code `https://api.openai.com`** without a base-URL setting cannot be redirected. Some have a `OPENAI_API_BASE` environment variable that achieves the same thing. - **Tools that require a specific model ID format** (e.g. `gpt-4` with no provider prefix) need the model picker reconfigured to accept arbitrary strings — most have a "custom model name" field. - **Tools that send Anthropic-shaped requests on the OpenAI endpoint** will be rejected. Use the [Anthropic SDK base URL](anthropic-sdk) instead. ## Confirming it works Before integrating, test from the command line that the tool's settings are right: ```bash curl https://lowrouter.ai/api/v1/chat/completions \ -H "Authorization: Bearer YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"lowrouter/auto","messages":[{"role":"user","content":"hi"}]}' ``` If `curl` returns a completion, the tool will too — once it's configured with the same URL and key. ## Tools we know work without changes - **LangChain** (`OpenAI` and `ChatOpenAI` classes — set `openai_api_base` and `openai_api_key`). - **LlamaIndex** (`OpenAI` and `OpenAILike` LLMs). - **LiteLLM** (proxy and library — set `api_base` and `api_key`). - **Vercel AI SDK** (`createOpenAI` from `@ai-sdk/openai-compatible`). - **Continue.dev** (`provider: openai-aiohttp` with `apiBase`). - **LM Studio** (Server tab → custom backend). ## When to prefer the OpenAI path over a tool's native provider If a tool has both an "OpenAI" and an "LowRouter / OpenRouter / custom gateway" option, prefer the OpenAI Compatible one. It exposes the fewest surprises: the tool sends a standard chat-completions request, LowRouter resolves the route, and the response is in the shape the tool already expects. --- # Models & providers # Models & providers Three short pages on the routing layer: - [Available models](available) — what's on the platform, how to read the model browser, and the IDs you'll use in requests. - [Routing](routing) — what `lowrouter/auto` does, how ties are broken, and what overrides do what. - [Per-request metadata](per-request-metadata) — the `lowrouter` block on every response, field by field. The dashboard's [model browser](/models) is the live, searchable view of the same data. --- # Available models # Available models The full catalogue lives on the [model browser](/models). It's generated from the same data the API exposes at [`GET /models`](../api-reference/models-and-providers), so the two agree by construction. ## How model IDs are formed ``` / ``` Examples: - `openai/gpt-4o-mini` - `anthropic/claude-sonnet-4-5` - `mistral/mistral-large-latest` The `` segment matches the `id` of an entry in [`GET /providers`](../api-reference/models-and-providers); the `` segment is the upstream model name. ## What each model card shows - **Display name** — the human-readable name, sometimes versioned. - **Provider** and **owner** — who serves it and who created it (these differ for re-hosted models, e.g. Llama on Mistral). - **Context window** — max input tokens. - **Pricing** — prompt, completion, and (where applicable) cached prompt rates per 1K tokens, in your account currency. - **Capabilities** — `tool_use`, `vision`, `structured_output`, `streaming`. Filter the catalogue by these. - **Eco data** — active parameter count and the energy estimate per 1K tokens. Both numbers come from the [methodology](../sustainable-ai/methodology). The confidence band (`accurate`, `medium`, `gross`) reflects how well-sourced the parameter count is. - **Regions** — where the upstream serves it (`eu-west`, `us-east`, …). ## Pseudo-models Two model strings are not actual models but routing primitives: - **`lowrouter/auto`** — pick a model based on the request and the current routing policy. See [routing](routing). - **`lowrouter/auto-cheap`** — auto-route biased toward the cheapest model that can plausibly handle the request. Useful for high-volume, low-importance work (classification, simple summaries). When a pseudo-model is used, the response's `model` field is the *resolved* model. ## Lifecycle - **Added.** When an upstream releases a new model and we integrate it, it appears on the model browser. Brand-new models start with a `medium` or `gross` eco confidence band until the parameter count is verified. - **Deprecated.** When an upstream announces deprecation, the model card flags it with a `deprecated` badge and a sunset date. Routing still uses it until the sunset date. - **Removed.** After the sunset date, requests for the model return `model_deprecated`. A migration suggestion is included in the error body when we have one. ## Filtering the catalogue The model browser supports filtering by: - Provider - Context window - Capability flags - Eco confidence band - Price range The same filters are reflected in `GET /models` query parameters; see the [discovery endpoints](../api-reference/models-and-providers). --- # Routing # Routing Every request goes through the router. For an explicit model the router's job is small (pick a healthy upstream that serves it); for `lowrouter/auto` the router picks the model too. This page describes both. ## The router's inputs For each request: - The model string in the request (`lowrouter/auto`, an explicit ID, or one of the auto-* pseudo-models). - The `route` object, if present (`provider`, `region`, `prefer_low_carbon`, `fallback`). - The key's policy: any of the per-key fields from [API key management](../guides/api-keys). - The current health of every upstream (`ok`, `degraded`, `unavailable`). - The current grid carbon intensity for each (provider, region) pair. ## Decision order The router applies constraints from the most specific to the least: 1. **Per-request `route`.** A pinned `provider` or `region` removes anything that doesn't match. 2. **Per-key policy.** A key's `region` pin or `models` allowlist is then applied. 3. **Account policy.** Defaults set in [auto-routing settings](/dashboard/auto-routing) — for instance, "prefer EU regions when possible." 4. **Auto-router scoring.** Whatever survives the above is scored on: - Capability match (does the model support the request shape — vision input, tool use, structured output?). - Provider health. - Latency (median over the last 5 minutes per upstream). - Carbon (grams per 1K tokens for that provider × region pair, with the bias controlled by `prefer_low_carbon`). - Price (matters when the request used `auto-cheap`). 5. **Tie-break.** When two candidates score within 1% of each other, the more recently used one wins (sticky routing within a session when `user` is supplied; otherwise random). ## What happens on failure If the chosen upstream returns a 5xx or times out: - The router marks that upstream's slot temporarily unavailable (decaying over a few minutes). - It tries the next eligible candidate **in the same region** (region is never violated silently). - If none, it tries other regions **only if** `route.fallback != false` and the per-key/account policy allows it. - If still none, it returns `503 service_unavailable` with a code describing what's missing. The full chain of attempts is recorded in the generation's `routing_trace`, visible on the dashboard's transaction-detail page. ## `prefer_low_carbon` The auto-router's carbon score is a weighted term in its overall score. Setting `prefer_low_carbon: true` on a request increases that weight, which pushes traffic toward providers serving from lower-grid-intensity regions when capability and latency are comparable. It does **not** override pinned regions or providers. It does **not** guarantee the lowest-carbon option in absolute terms — only that, all else equal, lower carbon wins. ## Worked example A request for `lowrouter/auto` with a vision input: 1. Drop models that don't support vision. 2. Drop providers in `degraded` or `unavailable` state. 3. Among the rest, score on (capability fit, latency, carbon). 4. The top-scored option wins. If the top-scored option later returns 502 mid-request: 1. Mark its `(provider, region)` slot unavailable. 2. Re-score the surviving candidates. 3. Retry with the new top option (capped at two retries per request). 4. If retries are exhausted, return 502 to the caller. ## Pinning recipes | Goal | Recipe | |------|--------| | EU residency | Set `route.region: eu-west` per request, or pin the key's region. | | Specific provider | `route.provider: anthropic`. Combine with `route.region` for region too. | | Hard pin (no failover) | `route.provider`, `route.region`, `route.fallback: false`. | | Lower carbon | `route.prefer_low_carbon: true`. Combine with no region pin to let the router pick the cleanest available region. | | Cheapest acceptable | Use `lowrouter/auto-cheap`. | ## What the router does not do - It does not benchmark output quality. The auto router optimises for capability, latency, carbon, and price — not "is the answer good". - It does not silently swap models mid-conversation. If you've been routed to model A on the first turn, the auto router prefers sticking with model A on the second when you supply a stable `user`. --- # Per-request metadata # Per-request metadata Every successful response from the gateway carries a top-level `lowrouter` field: ```json { "id": "chatcmpl-...", "choices": [...], "usage": {...}, "lowrouter": { "generation_id": "gen_01J9...", "provider": "openai", "region": "eu-west", "eco": { "energy_wh": 0.0021, "carbon_g": 0.00057, "carbon_per_1k_tokens_g": 0.013, "accuracy": "accurate", "methodology_version": "v0.4-2026-01" } } } ``` The same fields are echoed in HTTP headers (`X-LowRouter-Generation-ID`, `X-LowRouter-Provider`, `X-LowRouter-Region`) for clients that prefer header inspection. ## Field reference ### `generation_id` Globally unique, opaque ID for this generation. Use it to: - Look up the full record via [`GET /generation/{id}`](../api-reference/models-and-providers). - Open the corresponding row on the dashboard's [transactions page](/dashboard). - Correlate gateway logs with your application logs. The ID format may evolve; always treat it as opaque. ### `provider` The upstream that actually served the request. Matches an `id` in [`GET /providers`](../api-reference/models-and-providers). ### `region` The region the upstream served from. Strings like `eu-west`, `us-east`, `us-west` — the same values used in the `route.region` field on requests. ### `eco` The energy and carbon estimate. Five fields: - **`energy_wh`** — total energy estimated for the request, in watt-hours. Computed from the resolved model's active parameter count and the request's `total_tokens`. - **`carbon_g`** — total CO₂e estimated for the request, in grams. `energy_wh × grid_intensity_for(provider, region) / 1000`. - **`carbon_per_1k_tokens_g`** — `carbon_g` normalised per 1K total tokens. Comparable across requests of different sizes. - **`accuracy`** — confidence band: `accurate`, `medium`, or `gross`. Reflects how well-sourced the model's parameter count is. See [methodology](../sustainable-ai/methodology). - **`methodology_version`** — version string that uniquely identifies the formula coefficients and data inputs used. Stable for as long as the methodology is unchanged. ### When `eco` is absent The `eco` field can be missing when: - The resolved model's parameter count is unknown and we'd rather omit the number than fabricate one. - The request was a non-completion (e.g. a tool-only response with no tokens consumed). - The upstream returned an error mid-stream that prevented usage accounting. When it's missing, the dashboard shows the row with a `—` for the carbon column and a note linking to the methodology page. ## Streaming The same metadata arrives at end-of-stream as a `lowrouter.summary` chunk; see [streaming](../api-reference/streaming) for the exact shape. ## Privacy The `lowrouter` block contains nothing about prompt or response content — only the resolved route and the metric estimates. It is safe to log on the client side; we do. --- # Sustainable AI # Sustainable AI Four pages that document the energy and carbon numbers shown elsewhere on the platform: - [Methodology](methodology) — the formula, the coefficients, and the confidence bands. - [Data sources](data-sources) — where the parameter counts and grid intensities come from. - [Limits and what we don't claim](limits) — the explicit out-of-scope list. - [Reduce your footprint](reduce-your-footprint) — concrete things to change in your application that move the dashboard's numbers. These pages are the longest in the docs site on purpose. The numbers are only useful with the caveats; the caveats need to be readable. --- # Methodology # Methodology LowRouter estimates the carbon footprint of every inference request using the formula and data sources described on this page. This is the reference document; the numbers on the dashboard, the model browser, and the API responses all come from it. ## What we report Two numbers per request: - **Energy** in watt-hours (Wh). - **Carbon** in grams of CO₂ equivalent (gCO₂e). The carbon number is also normalised to **gCO₂e per 1,000 tokens** so requests of different sizes are comparable. ## The formula ``` energy_wh = ((α × P_active) + β) × tokens × 1000 carbon_g = energy_wh × grid_intensity_g_per_kwh / 1000 ``` Where: - **`P_active`** — number of active parameters during inference, in billions. For dense models this is the parameter count; for Mixture-of-Experts (MoE) models it's the parameters activated per token, not the total count. - **`α`** = 8.91 × 10⁻⁵ kWh per output-token-billion-param. - **`β`** = 1.43 × 10⁻³ kWh constant overhead per output token. - **`tokens`** — total tokens for the request (`prompt_tokens + completion_tokens`). - **`grid_intensity_g_per_kwh`** — annual-average carbon intensity of the electricity grid in the region serving the request. The energy formula is the [EcoLogits v0.4 inference model](https://ecologits.ai/0.4/methodology/llm_inference/). The grid-intensity values come from the International Energy Agency. ## Why this formula The EcoLogits model is published, peer-reviewed in spirit if not fully formally, and reproducible from public model parameter counts. It is not the only credible estimate but it is the one with the clearest derivation and the most active maintenance. Adopting it lets us compare numbers across providers using the same yardstick rather than reconciling each provider's bespoke estimate. ## Confidence bands Every estimate carries one of three labels: | Band | When | Expected error | |------|------|----------------| | `accurate` | Model size verified by the provider or in the EcoLogits registry; recent grid data. | ±20% | | `medium` | Model size from a credible third party (research paper, well-supported leak); grid data current. | ±40% | | `gross` | Model size estimated from the model name or industry rumour; or grid data older than 12 months. | ±60% or more | These bands are about *uncertainty in the inputs*, not about whether the formula itself is right. The formula has its own model-class limits documented on the [limits page](limits). When the band is `gross`, the dashboard widgets that aggregate carbon across many requests show a reduced-confidence indicator and link back to this page. ## Methodology versioning Every estimate stores the `methodology_version` that produced it (see [per-request metadata](../models/per-request-metadata)). The version captures: - The values of α and β. - The IEA grid-intensity dataset year. - The model parameter-count dataset version. When any of these change, the version is bumped and the change is noted in the dashboard's footer with the date. Old generations are *not* retroactively recomputed — their `methodology_version` is the one in effect when the request was served. ## Worked example A request: - Resolved model: `openai/gpt-4o-mini`. - Active parameters: 8B (this is the value we use; the provider has not officially confirmed it, so the band is `medium`). - Total tokens: 200. - Provider region: `eu-west`. - Grid intensity: ~340 gCO₂e/kWh (IEA EU average). Energy: ``` energy_wh = ((8.91e-5 × 8) + 1.43e-3) × 200 × 1000 = (7.13e-4 + 1.43e-3) × 200 × 1000 = 2.143e-3 × 200 × 1000 = 0.4286 Wh ``` Wait — this needs careful unit handling. The EcoLogits formula's α is per-token, β is per-token; we multiply by total tokens to get the total energy in kWh, then convert. Re-doing with explicit units: ``` energy_per_token_kwh = (8.91e-5 × 8) + 1.43e-3 = 0.002143 kWh/token energy_kwh = 0.002143 × 200 = 0.4286 kWh <-- too high ``` The EcoLogits coefficients in the published v0.4 are in **watt-hours per output token**, not kWh. The formula as we apply it is: ``` energy_wh_per_token = (α × P_active) + β = (8.91e-5 × 8) + 1.43e-3 = 0.002143 Wh/token energy_wh = 0.002143 × 200 = 0.43 Wh energy_kwh = 0.43 / 1000 = 4.3e-4 kWh carbon_g = 4.3e-4 × 340 = 0.146 g carbon_per_1k_tokens = 0.146 × (1000 / 200) = 0.73 g ``` So a 200-token completion on `gpt-4o-mini` from `eu-west` is estimated at **0.43 Wh** and **~0.15 gCO₂e**, with `medium` confidence. These are the numbers your `eco` block would carry. If you find a discrepancy between this worked example and what the gateway returns, the gateway is the source of truth — please file an issue so we can fix the documentation. ## The full picture Read the [data sources](data-sources) page next for where each number in the formula comes from. The [limits](limits) page lists what we explicitly do not claim. --- # Data sources # Data sources The carbon estimate is only as good as its inputs. This page lists each input, where it comes from, and how often we update it. ## Energy formula coefficients (α, β) - **Source**: [EcoLogits v0.4 — LLM inference methodology](https://ecologits.ai/0.4/methodology/llm_inference/). - **Values**: α = 8.91 × 10⁻⁵, β = 1.43 × 10⁻³ (Wh per output token). - **Updates**: when EcoLogits publishes a new methodology version with new coefficients, we evaluate it, bump the `methodology_version`, and note the change in the dashboard's footer with the effective date. The coefficients were derived from a regression across published benchmarks on a fleet of representative GPUs. They are an *average*; real hardware varies. ## Model active parameters - **Source priority**: 1. **EcoLogits registry** — models with verified architecture details (`accurate`). 2. **Provider documentation** — values published by the model creator (`accurate` or `medium`, depending on whether the statement is unambiguous). 3. **Research papers and credible leaks** — peer-reviewed architecture descriptions, technical reports (`medium`). 4. **Name-based estimates** — `llama-70b` → 70B (`gross`). - **Updates**: when a new model lands, we look up its parameter count in this priority order and tag the `accuracy` band accordingly. Re-evaluation happens monthly and on demand when a model's source upgrades. For Mixture-of-Experts models we use the **active parameter count** (parameters used per token), not the total parameter count. This distinction matters: a 600B-parameter MoE that activates 20B per token has the energy profile of a 20B dense model, not a 600B one. ## Grid carbon intensity - **Source**: [International Energy Agency](https://www.iea.org/data-and-statistics) electricity statistics — annual averages by country. - **Aggregation**: where a region maps to multiple countries (e.g. `eu-west` covers FR, DE, NL, IE), we use a population-weighted average for the region. - **Updates**: annually, when the IEA publishes the new dataset. Switching dataset versions bumps the `methodology_version`. We do not use real-time grid carbon intensity (which would require per-request lookups against a service like ElectricityMap). It's on the roadmap; the trade-off is that real-time numbers introduce sampling noise we'd need to explain. Annual averages are coarse but boring, and "boring" is a feature in a methodology document. ### Sample values | Region | Approx. gCO₂e/kWh | Notes | |--------|-------------------|-------| | `eu-west` | ~280–340 | Population-weighted Western Europe average. | | `eu-north` | ~50–80 | Mostly hydro/nuclear (Sweden, Norway, Finland). | | `us-west` | ~250–320 | California heavy renewables, broader West mixed. | | `us-east` | ~370–450 | Higher fossil share. | | `india` | ~700–800 | Coal-dominant grid. | Specific values per region are in the dashboard's settings page; the table above is for orientation. ## Pricing data - **Source**: each upstream provider's published price list, refreshed daily. - **Updates**: within one business day of an upstream change going live. - **Storage**: the price applied at the moment of a request is stored on the generation record, so historical bills are stable. Pricing isn't strictly part of the carbon methodology, but it is part of the per-request decision (`auto-cheap` and tie-break heuristics) so the source is documented here for completeness. ## What we deliberately don't include - **Hardware embodied carbon.** Manufacturing emissions for the GPUs serving inference are non-zero but we don't have a defensible per-token allocation. Until we do, omitting the number is more honest than guessing. - **Cooling overhead.** Data-centre cooling adds 10–30% to the energy used by compute (Power Usage Effectiveness, PUE). The EcoLogits formula incorporates an average overhead; provider-specific PUE refinements are pending more data. - **Network transport.** Energy used to move bytes between the gateway, the upstream, and the user is small relative to inference and is not counted. - **Training emissions.** Documented separately on the [limits page](limits). --- # Limits and what we don't claim # Limits and what we don't claim A defensible number needs a clear scope. This page is the scope. ## What the numbers cover - **Inference compute energy** for the model that served the request, using the EcoLogits v0.4 formula and the active-parameter count. - **Grid carbon intensity** of the region the upstream served from, using IEA annual averages. - **One step of the response.** A single request, end-to-end. That's it. ## What the numbers do not cover ### Training emissions We report inference only. Training a frontier model has a much larger and harder-to-attribute footprint, and folding a "share of training" into per-request numbers depends on assumptions (how many requests will the model serve in its lifetime?) that are unverifiable. We'd rather under-report inclusively than make up a number. ### Hardware manufacturing GPUs have an embodied carbon footprint from manufacturing. There is not yet a defensible way to allocate it per token. Some methodologies amortise it across the GPU's expected lifetime; the amortisation depends on assumptions we don't have. ### Real-time grid mix We use annual averages by region. Live carbon-aware routing — picking the region whose grid is currently cleanest — is a roadmap feature, not a current one. ### Embedding workloads The EcoLogits formula was derived for decoder-only autoregressive models. Encoder-only embedding models have a meaningfully different compute profile. We currently *do not* report eco numbers for embedding requests; the response carries no `eco` block. Modelling embeddings properly is on the roadmap. ### Tool-call orchestration When a single user-facing operation requires multiple LLM calls (e.g. an agent that thinks-then-acts-then-thinks), each call gets its own number. The aggregate footprint of a multi-step operation is the sum of those numbers. We do not do that aggregation automatically; that is your application's job. ### Browser, mobile, and on-device inference Inference that doesn't go through the gateway doesn't appear in the dashboard. The numbers describe what we measure, not the totality of your AI footprint. ## What the numbers are *estimates of* - An estimate, not a measurement. We do not have a wattmeter on the upstream provider's GPU. - A model-class estimate, not a per-request measurement. Two requests for the same model with the same token count get the same number. - An average over hardware, not a number specific to the GPU generation that served your request. Newer hardware is generally more efficient; the formula does not yet reflect generation. ## What we explicitly will not say - "X grams CO₂e saved by using LowRouter." - We don't know what your counterfactual is. The eco-impact widget on the dashboard offers a comparison against a *baseline you choose*. That comparison is what it claims to be — a comparison against your chosen baseline. - "Carbon-neutral", "net-zero", or "sustainable" applied to any individual request. - The numbers we publish exist in service of *better-informed decisions*, not certifications. Certifications require an audit we are not the right party to perform. - "Independently verified" beyond what is true. - The EcoLogits methodology is published; the IEA data is published. Our application of them is auditable from the source code. There is no third-party certification of the per-request numbers themselves. ## How to read the numbers - For ranking: comparing requests within a single `methodology_version` and `accuracy` band is meaningful. - For absolute claims: take the band into account. A `gross` number is right within a factor of two; quoting it to three significant figures is a category error. - For reporting: the numbers are appropriate for an internal dashboard or a "best-effort estimate" line in a sustainability report. They are not appropriate as the basis for a public emissions disclosure without acknowledging the methodology and its uncertainty. ## Reproducibility Every estimate can be reproduced from public inputs: 1. Get `resolved_model`, `region`, `total_tokens`, and `methodology_version` from the generation record. 2. Look up the model's active parameter count in the [EcoLogits registry](https://ecologits.ai) for that version. 3. Apply the formula on the [methodology page](methodology) with the coefficients of that version. 4. Look up the grid intensity for that region in the IEA dataset referenced by the version. 5. The result should match the stored `eco.carbon_g` to within floating-point rounding. If your reproduction diverges, that's a bug — please [file an issue](https://github.com/carbonifer/lowrouter/issues). --- # Reduce your footprint # Reduce your footprint The methodology gives you a number. This page is what to actually do about it. Each section is a lever, with the order of magnitude of its effect, and the trade-off it carries. ## Pick a smaller model when you can The largest single lever. Energy scales roughly linearly with active parameters (see the [methodology](methodology) formula). A 7B-active model is ~10× lower energy per token than a 70B model. When to use a smaller model: - Classification, extraction, and structured-output tasks. - High-volume background work (summarisation, tagging). - Anything where the output is verified by a downstream system. When **not** to: - Tasks that the smaller model fails at and your application has to retry on a larger one anyway. Two failed cheap calls + one big call > one big call. The pseudo-model `lowrouter/auto-cheap` biases toward the smallest model that can plausibly handle the request. Try it on your traffic; if quality holds, keep it. ## Cache prompts where the upstream supports it Several providers offer prompt caching: a long system prompt sent repeatedly with different user messages is charged at a discount on the cached portion. Where supported, this cuts both cost and energy on the cached part. Practical: - Place stable instructions, examples, and reference material **first** in the messages array. - Place the variable part (the user's question) **last**. - Keep the stable prefix above the upstream's caching threshold (for example, ≥1024 tokens). The dashboard's per-transaction view shows `cached_tokens` when an upstream applied a cache hit. ## Trim prompts Energy scales linearly with `total_tokens`. A 50% prompt-length reduction is a 50% energy reduction for the prompt portion. - Drop preamble that doesn't change the model's behaviour. - Drop few-shot examples that the model no longer needs. - Compress reference material (use IDs instead of full descriptions when the model has been trained on them). This compounds with prompt caching: a shorter cached prefix is cheaper *and* faster to cache. ## Choose the cleaner region when residency permits The grid intensity in `eu-north` (mostly hydro/nuclear) is roughly **5–8× lower** than in coal-heavy regions. If your data residency allows EU-North, you can pick it explicitly: ```json { "model": "lowrouter/auto", "messages": [...], "route": {"region": "eu-north", "prefer_low_carbon": true} } ``` Or, if you'd rather let the router pick whichever region is cleanest *and* available right now, just set `prefer_low_carbon: true` and leave `region` unset. ## Bound completion length `max_tokens` lets you stop generation when "enough is enough". For classification or extraction, set it to the actual answer length plus a small margin. The carbon savings are linear with the saved tokens. Some prompts respond well to "Answer in one sentence." instructions; others ignore them. Both are worth trying — the first time you check the dashboard, you'll see if the average completion length actually came down. ## Rate-limit your retries A retry storm can multiply your footprint by 3–10× of the underlying call. Use exponential backoff with jitter on retries, cap the retry count, and **never** retry on a 4xx that is not a 408 timeout. ## Memoise If the same user asks the same question twice, an in-application cache returns the previous answer at zero gateway cost. This is the cheapest watt: the one not spent. A few patterns that work: - Hash the prompt (after normalisation) and cache the response by that hash. - Cache lookup tables generated by the model (taxonomies, slot schemas) and refresh them on a schedule, not per-request. - For chat, cache the last few responses in memory keyed by the full conversation; reuse when the user re-asks immediately. ## Aggregate where you can Many small completions cost more than one larger one with multiple items. Examples: - Classify a batch of 20 items in a single request rather than 20 requests. - Extract structured fields for a list of inputs in one structured- output call. Watch out for context-window limits and for the cost of re-prompting when one item in the batch fails — sometimes individual calls are cheaper net. ## Order of magnitude summary | Lever | Typical reduction | |-------|-------------------| | Smaller model | 5–10× per request | | Region pinning to clean grid | 3–8× on the carbon term | | Prompt caching | 30–80% on the cached portion | | Prompt trimming | linear with the % trimmed | | Memoisation of repeats | 100% on the cached call | | `max_tokens` bounding | linear with completion-tokens saved | | Aggregation / batching | 2–5× on overhead | These are independent — applying several stacks. The first two are where most teams start. ## Confirm with the dashboard After any of these changes, check the eco-impact widget on the dashboard for the same time window before/after. If the change you made should have reduced the per-1K-tokens carbon number and didn't, something is off — the dashboard's transaction-detail page tells you which model and provider actually served each request. --- # FAQ # Frequently Asked Questions Answers to the questions developers and operators ask most often. Each section is self-contained — link directly to the slug in your own docs if it's useful. ## Is LowRouter compatible with the OpenAI SDK? Yes. Set the `base_url` (Python) or `baseURL` (TypeScript) to `https://lowrouter.ai/api/v1` and use your LowRouter API key. The SDK calls work unchanged. Detailed examples are on the [OpenAI SDK Python](integrations/openai-sdk-python) and [TypeScript](integrations/openai-sdk-typescript) pages. ## Is LowRouter compatible with the Anthropic SDK? Yes — point the SDK at `https://lowrouter.ai/api/v1/anthropic` and use a LowRouter API key. See [Anthropic SDK](integrations/anthropic-sdk). ## How is the CO₂ estimate calculated? `energy = ((α × P_active) + β) × tokens`, then `carbon = energy × grid intensity`. The full formula, coefficients, data sources, and confidence bands are on the [methodology page](sustainable-ai/methodology). The formula comes from EcoLogits v0.4; the grid intensities come from the IEA. ## Are the eco numbers real-time? No. The grid intensity is an annual regional average. Real-time carbon-aware routing is a roadmap feature, not a current one. See [limits](sustainable-ai/limits). ## What happens if a provider is down? The auto-router marks that provider's slot temporarily unavailable and dispatches to the next eligible upstream **in the same region**. It does not fail across regions silently — if the only eligible upstream was the one that's down and you've pinned a region, the request returns 503. See [routing](models/routing). ## How is pricing structured? Pre-paid credits. The cost of a request is `upstream price × tokens + platform fee × tokens`. Both components are visible per model on the [model browser](/models) and per request on the dashboard. Failed requests that produced no upstream charge cost zero credits. Full details: [credits and billing](guides/credits-and-billing). ## Do you store my prompts? No. We log token counts, model, provider, region, latency, and the eco estimate per request. We do not store prompt or response content. The full per-request schema is in [usage accounting](guides/usage-accounting). ## How do I keep requests in the EU? Two options: - **Per-request**: send `"route": {"region": "eu-west"}` in the body. - **Per-key**: pin the key to `eu-west` in **Dashboard → Keys**. The per-key option survives client misconfiguration, so prefer it for production. Details on the [routing](models/routing) and [API key management](guides/api-keys) pages. ## What's the rate limit? Three layers: per-key (the daily/monthly credit limits you set), per-account (default 600 RPM, 64 concurrent), and per-IP (auth and anonymous). Higher quotas are available on request. Full picture: [rate limits](api-reference/rate-limits). ## Can I use LowRouter from the browser? Not with the API key directly — that exposes the key to every page visitor. Mint short-lived tokens server-side and proxy requests through your backend. The pattern is the same as for OpenAI; both SDKs warn against `dangerouslyAllowBrowser`. See [OpenAI SDK TypeScript](integrations/openai-sdk-typescript). ## Are there free credits? No. Trying things out costs the same as production usage. See [philosophy / principles in practice](philosophy/principles-in-practice) for why. ## Can I get an invoice with my company details? Yes. Set the legal name, billing address, and VAT number under **Dashboard → Settings → Billing**. Invoices issued from that point onwards carry the company details. Past invoices can be re-issued via support. See [credits and billing](guides/credits-and-billing). ## How do I look up a request after the fact? Every response carries a `lowrouter.generation_id`. Pass it to `GET /api/v1/generation/{id}` to get the full record, or open it on the dashboard. The full record includes the resolved model, the provider, the region, the eco numbers, and the routing trace. See [per-request metadata](models/per-request-metadata) and [discovery endpoints](api-reference/models-and-providers). ## How do I rotate a leaked key? Create a new key, deploy it everywhere, then delete the old one. The deletion takes effect on the next request — no caching delay. Full guidance: [API key management](guides/api-keys). ## Does LowRouter charge a free trial? We do not run a free trial. Top up the smallest amount that makes sense for an evaluation; remaining credit can be refunded within 14 days under EU consumer law (see [credits and billing](guides/credits-and-billing)). ## Why don't I see an eco estimate on some requests? When the resolved model's parameter count is unknown or unverified, we omit the `eco` field rather than fabricate a number. The [methodology](sustainable-ai/methodology) page explains the confidence bands; the [limits](sustainable-ai/limits) page covers the cases where eco is deliberately absent (embedding requests, agent steps with no tokens, mid-stream upstream errors). ## Is there an SLA? Production accounts have a posted SLA on the dashboard footer. The default account does not — best-effort. ## Can I get a Data Processing Agreement? Yes. Contact us via the email on the [legal page](/impressum). ## Is the source code open? The platform's repository is at [github.com/carbonifer/lowrouter](https://github.com/carbonifer/lowrouter). Public components are licensed under the terms in the repository. ## I found a bug — where do I report it? [github.com/carbonifer/lowrouter/issues](https://github.com/carbonifer/lowrouter/issues), ideally with a request ID from the `X-Request-ID` header on a representative request. See [errors](api-reference/errors). --- # LowRouter API Reference **LowRouter API** (v1.0.0) OpenRouter-compatible API gateway for sustainable AI inference. Routes LLM requests to the most carbon-efficient provider while maintaining full compatibility with OpenAI and OpenRouter client libraries. ## Servers - `/api/v1` — API v1 endpoint ## Endpoints ### POST /chat/completions Create chat completion Creates a chat completion with automatic routing to the most carbon-efficient provider. Fully compatible with OpenAI's chat completions API. **Operation ID**: `createChatCompletion` · **Tags**: Completions ### POST /completions Create text completion Creates a text completion (legacy endpoint for compatibility). Routes to providers supporting text completion format. **Operation ID**: `createCompletion` · **Tags**: Completions ### POST /embeddings Create embeddings Creates an embedding vector representing the input text. Routes to providers supporting embeddings via Bifrost. Applies billing (input tokens only) and carbon tracking. **Operation ID**: `createEmbedding` · **Tags**: Embeddings ### GET /generation/{generation_id} Get generation statistics Retrieves detailed statistics for a specific generation including tokens, cost, carbon metrics, and latency. (NICE TO HAVE - may not be implemented in MVP) **Operation ID**: `getGeneration` · **Tags**: Generations ### GET /metrics/{generation_id} Get generation metrics Retrieves carbon and energy metrics for a specific generation. This endpoint provides historical access to energy consumption and carbon emissions data for completed requests. **Operation ID**: `getGenerationMetrics` · **Tags**: Metrics ### GET /models List available models Returns a list of all available models with their capabilities, pricing, and carbon intensity metrics. **Operation ID**: `listModels` · **Tags**: Models ### GET /models/{model} Retrieve a model Returns details for a single model matching the OpenAI retrieve model format. The model parameter may contain slashes (e.g. nebius/NousResearch/Hermes-4-70B). **Operation ID**: `getModel` · **Tags**: Models ### GET /providers List available providers Returns a list of all configured providers with their status and regions. (NICE TO HAVE - may not be implemented in MVP) **Operation ID**: `listProviders` · **Tags**: Providers