Cost-budget enforcement

You can hard-cap the per-request cost with max_cost_usd. Lumen estimates output cost as (approx_tokens / 1000) * cost_per_1k * 2.0 (2x assumes balanced input+output).

Example

resp = client.chat.completions.create(
    model="auto",
    messages=[...],
    extra_body={"max_cost_usd": 0.05}  # never spend more than 5 cents
)

If no tier fits the budget for the given prompt size, the API returns:

HTTP/1.1 402 Payment Required
{"detail": "no tier fits cost budget $0.0500 for ~120000 tokens"}

Plan-level monthly cap

Each plan also has a budget_cap_usd_monthly:

Free: $1/mo (effectively cuts off when free quota is exhausted)
Starter: $50/mo soft cap
Pro: $300/mo soft cap
Scale / Enterprise: unlimited

When you hit the monthly cap, requests return 402 until next billing cycle or upgrade.

Estimating ahead of time

# Cheap pre-flight: ask for the model list and pick by your own logic
import requests
models = requests.get("https://lumen-api.eliteaiempire.com/v1/models").json()
for m in models["data"]:
    est = (your_token_count / 1000) * m["cost_per_1k"]
    print(f"{m['id']}: ~${est:.4f}")