Sustainability-first

“Sustainable AI” is doing a lot of marketing work right now. This page describes what the phrase means inside LowRouter — concretely, what we measure, what we report, and what we have decided not to claim.

What we measure

For every inference request, we estimate two numbers:

Energy per output token (Wh), derived from the model’s active parameter count using the EcoLogits methodology.
Grid carbon intensity (gCO₂e/kWh) for the region the provider serves the request from, sourced from the International Energy Agency.

The product gives us gCO₂e per 1,000 tokens for the request. The exact formula and the confidence we attach to each estimate are in the methodology page.

These numbers are exposed:

On every API response, in an eco block.
On the dashboard, aggregated by day, model, provider, and region.
On the public model browser, as a comparable estimate per model.

What we don’t measure (yet)

Training emissions. We report inference only. Training is a separate, larger, and harder-to-attribute footprint, and folding it into per-request numbers is misleading.
Hardware embodied carbon. GPU manufacturing has a real footprint; we don’t yet have a defensible per-token number for it.
Real-time grid mix. We use annual averages by region. Live carbon-aware routing is a future feature, not a current one.
Embedding workloads. Encoder-only models have a different compute profile and our formula does not yet model them well.

We list these limits because they are part of what an honest number looks like. The sustainable-AI page repeats them next to every chart.

Constraints that fall out of this position

A few platform decisions follow from taking energy and carbon seriously:

The site itself is on a tight transfer budget. Pages like this one ship under 100 KB total. The docs site renders server-side with no client framework. Heavy interactive widgets are not added without a reason.
The default route is auto-mode, not “the largest model.” If a smaller model can do the job for a fraction of the energy, we’d rather it be the default.
Providers and regions with very high grid intensity are deprioritised in routing unless explicitly pinned by the caller. See provider routing.
We are slower to add features than we’d like. Every new background job, every dashboard widget, every fancy visualisation is a per-user energy cost; we add them when they earn the cost.

What this is not

A pledge that any single request is green. A claim that the estimate is exact. An assertion that running an LLM through LowRouter is meaningfully better for the planet than running it directly.

What it is: a measured number, an open formula, and a default that prefers the smaller model and the cleaner grid when other constraints allow.