Deploxa Documentation

AI Gateway

A unified proxy for all major AI providers. Route requests, track costs, set spend limits, and fall back across providers — all with a single API key.

What the AI Gateway does

Single endpoint

Use one base URL and one API key regardless of which AI provider you call.

Cost tracking

Token usage and spend per model, per project, per user — available in the analytics dashboard.

Rate limiting

Set token-per-minute and request-per-minute limits per organization or API key.

Provider fallback

Automatically retry on another provider if the primary returns a 5xx or rate limit error.

Request caching

Cache identical prompts with configurable TTL to reduce costs on repeated queries.

Audit log

Every request is logged with model, tokens, latency, and cost. Exportable via the REST API.

Getting started

Generate an AI Gateway key in Project → AI Gateway → API Keys. Then replace your provider's base URL with the Deploxa gateway endpoint and set your provider keys in the gateway settings (not in your app).

Gateway base URL

https://gateway.deploxa.app/v1

Supported providers

Provider	X-AI-Provider header	Available models	Status
OpenAI	openai	gpt-4o, gpt-4-turbo, gpt-3.5-turbo, o1-mini	GA
Anthropic	anthropic	claude-opus-4, claude-sonnet-4, claude-haiku-4	GA
Google (Gemini)	google	gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash	GA
Mistral	mistral	mistral-large, mistral-medium, codestral	Beta
Cohere	cohere	command-r-plus, command-r	Beta

Usage examples

Pass X-AI-Provider to select which provider handles the request. The request body stays in OpenAI format — the gateway translates it for you.

curl

curl https://gateway.deploxa.app/v1/chat/completions \ -H "Authorization: Bearer $DEPLOXA_GATEWAY_KEY" \ -H "X-AI-Provider: anthropic" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4-6", "messages": [{ "role": "user", "content": "Hello!" }], "max_tokens": 256 }'

Node.js (OpenAI SDK)

import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.DEPLOXA_GATEWAY_KEY, baseURL: "https://gateway.deploxa.app/v1", defaultHeaders: { "X-AI-Provider": "openai", // or "anthropic", "google", etc. }, }); const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], });

Python

from openai import OpenAI client = OpenAI( api_key=os.environ["DEPLOXA_GATEWAY_KEY"], base_url="https://gateway.deploxa.app/v1", default_headers={"X-AI-Provider": "google"}, ) response = client.chat.completions.create( model="gemini-2.0-flash", messages=[{"role": "user", "content": "Hello!"}], )

Request caching

Add X-Cache-TTL: 3600 (seconds) to cache the response for identical prompts. Cache keys are derived from the full request body. Cache hits return the stored response instantly and do not count against your token quota.

curl

curl https://gateway.deploxa.app/v1/chat/completions \ -H "Authorization: Bearer $DEPLOXA_GATEWAY_KEY" \ -H "X-AI-Provider: openai" \ -H "X-Cache-TTL: 3600" \ # cache for 1 hour -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [...] }'

Rate limits and spend caps

Configure per-key limits in Project → AI Gateway → Settings:

Requests per minute (RPM) — hard limit, returns 429 when exceeded

Tokens per minute (TPM) — tracks prompt + completion tokens

Monthly spend cap — gateway rejects requests once the limit is hit

Per-model cost allocation — break down spend by model in analytics

The AI Gateway is available on Pro and Team plans. Free plan accounts can use the gateway in development with a 1,000 request/month limit.

Feature Flags REST API Reference

AI Gateway

A unified proxy for all major AI providers. Route requests, track costs, set spend limits, and fall back across providers — all with a single API key.

What the AI Gateway does

Single endpoint

Use one base URL and one API key regardless of which AI provider you call.

Cost tracking

Token usage and spend per model, per project, per user — available in the analytics dashboard.

Rate limiting

Set token-per-minute and request-per-minute limits per organization or API key.

Provider fallback

Automatically retry on another provider if the primary returns a 5xx or rate limit error.

Request caching

Cache identical prompts with configurable TTL to reduce costs on repeated queries.

Audit log

Every request is logged with model, tokens, latency, and cost. Exportable via the REST API.

Getting started

Gateway base URL

https://gateway.deploxa.app/v1

Supported providers

Provider	X-AI-Provider header	Available models	Status
OpenAI	openai	gpt-4o, gpt-4-turbo, gpt-3.5-turbo, o1-mini	GA
Anthropic	anthropic	claude-opus-4, claude-sonnet-4, claude-haiku-4	GA
Google (Gemini)	google	gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash	GA
Mistral	mistral	mistral-large, mistral-medium, codestral	Beta
Cohere	cohere	command-r-plus, command-r	Beta

Usage examples

Pass X-AI-Provider to select which provider handles the request. The request body stays in OpenAI format — the gateway translates it for you.

curl

Node.js (OpenAI SDK)

Python

Request caching

curl

Rate limits and spend caps

Configure per-key limits in Project → AI Gateway → Settings:

Requests per minute (RPM) — hard limit, returns 429 when exceeded

Tokens per minute (TPM) — tracks prompt + completion tokens

Monthly spend cap — gateway rejects requests once the limit is hit

Per-model cost allocation — break down spend by model in analytics

The AI Gateway is available on Pro and Team plans. Free plan accounts can use the gateway in development with a 1,000 request/month limit.

Feature Flags REST API Reference