LowRouter

Sustainability-first

“Sustainable AI” is doing a lot of marketing work right now. This page describes what the phrase means inside LowRouter — concretely, what we measure, what we report, and what we have decided not to claim.

What we measure

For every inference request, we estimate two numbers:

  • Energy per output token (Wh), derived from the model’s active parameter count using the EcoLogits methodology.
  • Grid carbon intensity (gCO₂e/kWh) for the region the provider serves the request from, sourced from the International Energy Agency.

The product gives us gCO₂e per 1,000 tokens for the request. The exact formula and the confidence we attach to each estimate are in the methodology page.

These numbers are exposed:

  • On every API response, in an eco block.
  • On the dashboard, aggregated by day, model, provider, and region.
  • On the public model browser, as a comparable estimate per model.

What we don’t measure (yet)

  • Training emissions. We report inference only. Training is a separate, larger, and harder-to-attribute footprint, and folding it into per-request numbers is misleading.
  • Hardware embodied carbon. GPU manufacturing has a real footprint; we don’t yet have a defensible per-token number for it.
  • Real-time grid mix. We use annual averages by region. Live carbon-aware routing is a future feature, not a current one.
  • Embedding workloads. Encoder-only models have a different compute profile and our formula does not yet model them well.

We list these limits because they are part of what an honest number looks like. The sustainable-AI page repeats them next to every chart.

Constraints that fall out of this position

A few platform decisions follow from taking energy and carbon seriously:

  • The site itself is on a tight transfer budget. Pages like this one ship under 100 KB total. The docs site renders server-side with no client framework. Heavy interactive widgets are not added without a reason.
  • The default route is auto-mode, not “the largest model.” If a smaller model can do the job for a fraction of the energy, we’d rather it be the default.
  • Providers and regions with very high grid intensity are deprioritised in routing unless explicitly pinned by the caller. See provider routing.
  • We are slower to add features than we’d like. Every new background job, every dashboard widget, every fancy visualisation is a per-user energy cost; we add them when they earn the cost.

What this is not

A pledge that any single request is green. A claim that the estimate is exact. An assertion that running an LLM through LowRouter is meaningfully better for the planet than running it directly.

What it is: a measured number, an open formula, and a default that prefers the smaller model and the cleaner grid when other constraints allow.