Lumen API documentation

Tier ladder

Lumen routes across 4 cost/quality tiers, plus an anthropic-only brand cascade for Pro+ customers.

Pareto frontier

| Tier | Model | $/1k tokens | Min quality | Max context |

|--------|------------------|-------------|-------------|-------------|

| nano | gemini-2.5-flash | $0.001 | 0.60 | 30k |

| small | gpt-4o-mini | $0.003 | 0.80 | 128k |

| mid | gemini-2.5-pro | $0.012 | 0.90 | 2M |

| high | claude-opus-4-8 | $0.060 | 0.98 | 200k |

These are the cheapest model at each quality floor — anything cheaper at the same quality is dominated and excluded.

Brand cascade (Pro+ only)

| Tier | Model | $/1k tokens | Min quality | Max context |

|-----------------|--------------------|-------------|-------------|-------------|

| anthropic-fast | claude-haiku-4-5 | $0.005 | 0.78 | 200k |

| anthropic-mid | claude-sonnet-4-6 | $0.015 | 0.92 | 200k |

| (anthropic-high)| claude-opus-4-8 | $0.060 | 0.98 | 200k |

Set brand_pref="anthropic" in your request to route through this cascade exclusively.

How a tier is selected

1. Classify difficulty: length + tool-use + json mode + reasoning/code/creative signals -> a 0..1 score.

2. Compute effective floor: max(difficulty_score, your quality_floor).

3. Filter: keep only tiers whose min_quality >= effective_floor AND max_context >= request size.

4. Pick cheapest: lowest cost_per_1k from the filtered set.

5. Budget check: if max_cost_usd was set and the chosen tier exceeds it, downgrade to the highest-quality tier that fits the budget. If nothing fits, return HTTP 402.

Plan ceilings

Each plan caps which tier you can route to:

  • Free — nano only
  • Starter — through high
  • Pro / Scale — through high + brand_pref enabled
  • Enterprise — through high + custom routing