DocsOverviewQuick StartFrameworksDeploymentsDomainsEnv VarsFeature FlagsAI GatewayAPI ReferenceWebhooks

AI Gateway

AI Gateway

A unified proxy for all major AI providers. Route requests, track costs, set spend limits, and fall back across providers — all with a single API key.

What the AI Gateway does

Single endpoint

Use one base URL and one API key regardless of which AI provider you call.

Cost tracking

Token usage and spend per model, per project, per user — available in the analytics dashboard.

Rate limiting

Set token-per-minute and request-per-minute limits per organization or API key.

Provider fallback

Automatically retry on another provider if the primary returns a 5xx or rate limit error.

Request caching

Cache identical prompts with configurable TTL to reduce costs on repeated queries.

Audit log

Every request is logged with model, tokens, latency, and cost. Exportable via the REST API.

Getting started

Generate an AI Gateway key in Project → AI Gateway → API Keys. Then replace your provider's base URL with the Deploxa gateway endpoint and set your provider keys in the gateway settings (not in your app).

Gateway base URL

https://gateway.deploxa.app/v1

Supported providers

ProviderX-AI-Provider headerAvailable modelsStatus
OpenAIopenaigpt-4o, gpt-4-turbo, gpt-3.5-turbo, o1-miniGA
Anthropicanthropicclaude-opus-4, claude-sonnet-4, claude-haiku-4GA
Google (Gemini)googlegemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flashGA
Mistralmistralmistral-large, mistral-medium, codestralBeta
Coherecoherecommand-r-plus, command-rBeta

Usage examples

Pass X-AI-Provider to select which provider handles the request. The request body stays in OpenAI format — the gateway translates it for you.

curl

curl https://gateway.deploxa.app/v1/chat/completions \ -H "Authorization: Bearer $DEPLOXA_GATEWAY_KEY" \ -H "X-AI-Provider: anthropic" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "messages": [{ "role": "user", "content": "Hello!" }], "max_tokens": 256 }'

Node.js (OpenAI SDK)

import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.DEPLOXA_GATEWAY_KEY, baseURL: "https://gateway.deploxa.app/v1", defaultHeaders: { "X-AI-Provider": "openai", // or "anthropic", "google", etc. }, }); const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], });

Python

from openai import OpenAI client = OpenAI( api_key=os.environ["DEPLOXA_GATEWAY_KEY"], base_url="https://gateway.deploxa.app/v1", default_headers={"X-AI-Provider": "google"}, ) response = client.chat.completions.create( model="gemini-2.0-flash", messages=[{"role": "user", "content": "Hello!"}], )

Request caching

Add X-Cache-TTL: 3600 (seconds) to cache the response for identical prompts. Cache keys are derived from the full request body. Cache hits return the stored response instantly and do not count against your token quota.

curl

curl https://gateway.deploxa.app/v1/chat/completions \ -H "Authorization: Bearer $DEPLOXA_GATEWAY_KEY" \ -H "X-AI-Provider: openai" \ -H "X-Cache-TTL: 3600" \ # cache for 1 hour -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [...] }'

Rate limits and spend caps

Configure per-key limits in Project → AI Gateway → Settings:

Requests per minute (RPM) — hard limit, returns 429 when exceeded
Tokens per minute (TPM) — tracks prompt + completion tokens
Monthly spend cap — gateway rejects requests once the limit is hit
Per-model cost allocation — break down spend by model in analytics

The AI Gateway is available on Pro and Team plans. Free plan accounts can use the gateway in development with a 1,000 request/month limit.

Feature FlagsREST API Reference