AI Gateway
A unified proxy for all major AI providers. Route requests, track costs, set spend limits, and fall back across providers — all with a single API key.
Single endpoint
Use one base URL and one API key regardless of which AI provider you call.
Cost tracking
Token usage and spend per model, per project, per user — available in the analytics dashboard.
Rate limiting
Set token-per-minute and request-per-minute limits per organization or API key.
Provider fallback
Automatically retry on another provider if the primary returns a 5xx or rate limit error.
Request caching
Cache identical prompts with configurable TTL to reduce costs on repeated queries.
Audit log
Every request is logged with model, tokens, latency, and cost. Exportable via the REST API.
Generate an AI Gateway key in Project → AI Gateway → API Keys. Then replace your provider's base URL with the Deploxa gateway endpoint and set your provider keys in the gateway settings (not in your app).
Gateway base URL
| Provider | X-AI-Provider header | Available models | Status |
|---|---|---|---|
| OpenAI | openai | gpt-4o, gpt-4-turbo, gpt-3.5-turbo, o1-mini | GA |
| Anthropic | anthropic | claude-opus-4, claude-sonnet-4, claude-haiku-4 | GA |
| Google (Gemini) | gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash | GA | |
| Mistral | mistral | mistral-large, mistral-medium, codestral | Beta |
| Cohere | cohere | command-r-plus, command-r | Beta |
Pass X-AI-Provider to select which provider handles the request. The request body stays in OpenAI format — the gateway translates it for you.
curl
Node.js (OpenAI SDK)
Python
Add X-Cache-TTL: 3600 (seconds) to cache the response for identical prompts. Cache keys are derived from the full request body. Cache hits return the stored response instantly and do not count against your token quota.
curl
Configure per-key limits in Project → AI Gateway → Settings:
The AI Gateway is available on Pro and Team plans. Free plan accounts can use the gateway in development with a 1,000 request/month limit.