# Data sources

The carbon estimate is only as good as its inputs. This page lists
each input, where it comes from, and how often we update it.

## Energy formula coefficients (α, β)

- **Source**:
  [EcoLogits v0.4 — LLM inference methodology](https://ecologits.ai/0.4/methodology/llm_inference/).
- **Values**: α = 8.91 × 10⁻⁵, β = 1.43 × 10⁻³ (Wh per output token).
- **Updates**: when EcoLogits publishes a new methodology version
  with new coefficients, we evaluate it, bump the
  `methodology_version`, and note the change in the dashboard's
  footer with the effective date.

The coefficients were derived from a regression across published
benchmarks on a fleet of representative GPUs. They are an *average*;
real hardware varies.

## Model active parameters

- **Source priority**:
  1. **EcoLogits registry** — models with verified architecture
     details (`accurate`).
  2. **Provider documentation** — values published by the model
     creator (`accurate` or `medium`, depending on whether the
     statement is unambiguous).
  3. **Research papers and credible leaks** — peer-reviewed
     architecture descriptions, technical reports (`medium`).
  4. **Name-based estimates** — `llama-70b` → 70B (`gross`).
- **Updates**: when a new model lands, we look up its parameter count
  in this priority order and tag the `accuracy` band accordingly.
  Re-evaluation happens monthly and on demand when a model's source
  upgrades.

For Mixture-of-Experts models we use the **active parameter count**
(parameters used per token), not the total parameter count. This
distinction matters: a 600B-parameter MoE that activates 20B per
token has the energy profile of a 20B dense model, not a 600B one.

## Grid carbon intensity

- **Source**:
  [International Energy Agency](https://www.iea.org/data-and-statistics)
  electricity statistics — annual averages by country.
- **Aggregation**: where a region maps to multiple countries (e.g.
  `eu-west` covers FR, DE, NL, IE), we use a population-weighted
  average for the region.
- **Updates**: annually, when the IEA publishes the new dataset.
  Switching dataset versions bumps the `methodology_version`.

We do not use real-time grid carbon intensity (which would require
per-request lookups against a service like ElectricityMap). It's on
the roadmap; the trade-off is that real-time numbers introduce
sampling noise we'd need to explain. Annual averages are coarse but
boring, and "boring" is a feature in a methodology document.

### Sample values

| Region | Approx. gCO₂e/kWh | Notes |
|--------|-------------------|-------|
| `eu-west` | ~280–340 | Population-weighted Western Europe average. |
| `eu-north` | ~50–80 | Mostly hydro/nuclear (Sweden, Norway, Finland). |
| `us-west` | ~250–320 | California heavy renewables, broader West mixed. |
| `us-east` | ~370–450 | Higher fossil share. |
| `india` | ~700–800 | Coal-dominant grid. |

Specific values per region are in the dashboard's settings page; the
table above is for orientation.

## Pricing data

- **Source**: each upstream provider's published price list,
  refreshed daily.
- **Updates**: within one business day of an upstream change going
  live.
- **Storage**: the price applied at the moment of a request is
  stored on the generation record, so historical bills are stable.

Pricing isn't strictly part of the carbon methodology, but it is part
of the per-request decision (`auto-cheap` and tie-break heuristics)
so the source is documented here for completeness.

## What we deliberately don't include

- **Hardware embodied carbon.** Manufacturing emissions for the GPUs
  serving inference are non-zero but we don't have a defensible
  per-token allocation. Until we do, omitting the number is more
  honest than guessing.
- **Cooling overhead.** Data-centre cooling adds 10–30% to the energy
  used by compute (Power Usage Effectiveness, PUE). The EcoLogits
  formula incorporates an average overhead; provider-specific PUE
  refinements are pending more data.
- **Network transport.** Energy used to move bytes between the
  gateway, the upstream, and the user is small relative to inference
  and is not counted.
- **Training emissions.** Documented separately on the
  [limits page](limits).