Pay 30-70% less
without sacrificing quality.
Every LLM call is a coin-flip on cost. You pick a model and hope. Lumen picks the cheapest model that meets your quality floor — per request, automatically. OpenAI-compatible. Hash-chained audit log.
Start free — 100 requests/day, no credit card QuickstartCost classifier per request
Length, tool-use, JSON mode, reasoning/code/creative signals — every request is scored in microseconds before it hits a model. Easy lookups go to a $0.001 tier. Hard prompts cascade to frontier.
Won't degrade below threshold
You set a quality floor (default 0.75). Lumen picks the cheapest tier that clears it — never below. Mid prompt with floor 0.9? You get gemini-2.5-pro, not flash. No surprises.
Hash-chained log every call
Each response includes an audit_hash. The on-disk log is SHA-256 hash-chained with a server secret. Tamper-evident. Look up any past call at /v1/audit/<hash>. SOC2-ready.
How much would you save?
Paste your monthly OpenAI/Anthropic bill and answer two questions. We model the routing for you.
Empire dog-food (we eat our own routing)
Lumen is the routing brain behind these production properties, all running on the public API right now:
- Powers trustdeskai.eliteaiempire.com — DSAR automation routing
- Used by loanpilotai.eliteaiempire.com — personal-loan AI advisor
- Routes mortgagepulse.eliteaiempire.com — mortgage-comparison chatbot
Pricing
Free forever for low-volume. Three paid tiers; upgrade in-app, billed by Stripe.
Free
Test, prototype, hobby projects.
- 100 requests/day
- 3,000 requests/month
- nano tier only (gemini-flash)
- Cache + audit log
Starter
First production app.
- 5,000 requests/day
- 150,000 requests/month
- All 4 tiers (nano → high)
- $50/mo budget cap
- Email support
Pro
Real SaaS workloads. Most popular.
- 50,000 requests/day
- 1.5M requests/month
- Anthropic brand_pref
- $300/mo budget cap
- Priority routing
Scale
Heavy production, 7-figure traffic.
- 500,000 requests/day
- 15M requests/month
- No budget cap
- Highest queue priority
- Slack support
Enterprise? Custom contracts, SOC2 docs, dedicated routing pool, net-30. Contact sales →
How Lumen compares
| Lumen | OpenRouter | Helicone | Portkey | |
|---|---|---|---|---|
| Picks model for you | ✓ auto | marketplace | no | no |
| Quality-floor cascade | ✓ | — | — | — |
| Cryptographic audit log | ✓ hash-chained | — | logs only | logs only |
| OpenAI-compatible | ✓ | ✓ | ✓ | ✓ |
| Multi-provider failover | ✓ 8 vendors | ✓ | single | ✓ |
| Free tier | 100/day | credits | 10k/mo | 10k/mo |
| Geopolitical filter (no CN/RU) | ✓ | — | — | — |
FAQ
Will my code work without changes?
Yes — Lumen speaks OpenAI chat-completions wire format. Change base_url to https://lumen-api.eliteaiempire.com/v1 and use your lumen_sk_… key. model="auto" turns on routing; you can also pin a specific tier.
Which models does Lumen route to?
Gemini 2.5 Flash, GPT-4o-mini, Gemini 2.5 Pro, Claude Opus 4.8 by default. Pro+ unlocks brand_pref="anthropic" which routes through Haiku 4.5, Sonnet 4.6, Opus 4.8. Full list at tier-ladder.
How do you decide which tier?
Each request is scored 0-1 for difficulty (length, tool-use, JSON-mode, reasoning/code/creative signals). The cheapest tier whose min_quality clears max(score, your_floor) wins. Read quality-floor for details.
What if I want to force a specific model?
Pass model="nano" or model="gpt-4o-mini". The router will use exactly that tier. Your plan's tier ceiling still applies.
Do you train on my data?
No. Lumen is a routing layer — we pass requests through to vendor APIs and never retain prompts or responses in training pipelines. Cache TTL is 1 hour, then evicted.
What about streaming?
Streaming works (stream: true) but bypasses the response cache. Audit log entries are still written.
Can I bring my own keys?
Not on Free/Starter. Pro+ customers can request BYO-key routing — we use your vendor accounts instead of ours, and you pay only the Lumen routing fee. Email support to enable.
Audit log — what do I do with it?
Every lumen.audit_hash in a response points to a SHA-256-chained on-disk entry recording tier/model/cost/tokens/timing. Use it for SOC2 evidence, dispute resolution, or "which model answered this critical user question?" debugging.