Cost-budget enforcement
You can hard-cap the per-request cost with max_cost_usd. Lumen estimates output cost as (approx_tokens / 1000) * cost_per_1k * 2.0 (2x assumes balanced input+output).
Example
resp = client.chat.completions.create(
model="auto",
messages=[...],
extra_body={"max_cost_usd": 0.05} # never spend more than 5 cents
)
If no tier fits the budget for the given prompt size, the API returns:
HTTP/1.1 402 Payment Required
{"detail": "no tier fits cost budget $0.0500 for ~120000 tokens"}
Plan-level monthly cap
Each plan also has a budget_cap_usd_monthly:
- Free: $1/mo (effectively cuts off when free quota is exhausted)
- Starter: $50/mo soft cap
- Pro: $300/mo soft cap
- Scale / Enterprise: unlimited
When you hit the monthly cap, requests return 402 until next billing cycle or upgrade.
Estimating ahead of time
# Cheap pre-flight: ask for the model list and pick by your own logic
import requests
models = requests.get("https://lumen-api.eliteaiempire.com/v1/models").json()
for m in models["data"]:
est = (your_token_count / 1000) * m["cost_per_1k"]
print(f"{m['id']}: ~${est:.4f}")