Chat Completions API (/chat/completions)

TrueFoundry AI Gateway provides a universal API for all supported models via the standard OpenAI /chat/completions endpoint. This unified interface allows you to seamlessly work with models from different providers through a consistent API. API Reference: POST /chat/completions

Provider capabilities

The gateway maps the OpenAI Chat Completions contract to each provider. The table below summarizes feature support by provider for this endpoint.

Legend:

✅ Supported by Provider and Truefoundry
Provided by provider, but not by Truefoundry
Provider does not support this feature

Provider	Stream	Non Stream	Tools	JSON Mode	Schema Mode	Prompt Caching	Reasoning	Structured Output
OpenAI	✅	✅	✅	✅	✅	✅	✅
Azure OpenAI	✅	✅	✅	✅	✅	✅	✅
Anthropic	✅	✅	✅		✅	✅	✅
Bedrock	✅	✅	✅		✅	✅	✅
Vertex	✅	✅	✅		✅	✅	✅
Cohere	✅	✅	✅	✅	✅		✅
Gemini	✅	✅	✅	✅	✅	✅	✅
Groq	✅	✅	✅	✅	✅	✅	✅
AI21	✅	✅		✅
Cerebras	✅	✅		✅			✅
Wafer	✅	✅	✅
SambaNova	✅	✅		✅			✅
Perplexity-AI	✅	✅		✅			✅	✅
Together-AI	✅	✅	✅	✅		✅	✅	✅
xAI	✅	✅	✅	✅	✅	✅	✅	✅
DeepInfra	✅	✅	✅	✅		✅	✅
Nebius	✅	✅	✅	✅			✅

Section	Description
Overview	Provider capabilities, getting started, and input controls
Multimodal	Images, audio, video, PDFs, and vision models
Tools & signatures	Function calling and thought signatures
Structured outputs	JSON mode, JSON schema, and Pydantic
Caching & reasoning	Prompt caching and reasoning models
Extended thinking	Thinking blocks, multi-turn reasoning, grounding

Getting Started

You can use the standard OpenAI client to send requests to the gateway:

from openai import OpenAI

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="{GATEWAY_BASE_URL}"
)

response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini", # this is the truefoundry model id
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)

print(response.choices[0].message.content)

Configuration

You will need to configure the following:

base_url: The base URL of the TrueFoundry AI Gateway
api_key: API key generated from Personal Access Tokens
model: TrueFoundry model ID in the format provider_account/model_name (available in the LLM playground UI)

See Integrate with code for instructions on obtaining these values. For using native provider SDKs (OpenAI, Google Gen AI, Anthropic, boto3), see Native SDK Support.

Input Controls

System Prompts

System prompts set the behavior and context for the model by defining the assistant’s role, tone, and constraints:

response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that specializes in Python programming."},
        {"role": "user", "content": "How do I write a function to calculate factorial?"}
    ]
)

Request Parameters

Fine-tune model behavior with these common parameters:

response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    temperature=0.7,       # Controls randomness (0.0 to 1.0)
    max_tokens=100,        # Maximum tokens to generate
    verbosity="high",      # Constrains verbosity: low, medium, high
    top_p=0.9,             # Nucleus sampling parameter
    frequency_penalty=0.0, # Reduces repetition
    presence_penalty=0.0,  # Encourages new topics
    stop=["\n", "Human:"]  # Stop sequences
)

Some models don’t support all parameters. For example, temperature is not supported by o series models like o3-mini.

Chat Completions: Multimodal

Send images, audio, video, and PDFs with the Chat Completions API, plus vision model support

Chat Completions: Tools & Signatures

Function calling, tool workflows, and thought signatures for Chat Completions

Chat Completions: Structured Outputs

JSON mode, JSON schema, Pydantic integration, and provider support for structured responses

Chat Completions: Caching & Reasoning

Prompt caching and reasoning models for the Chat Completions API

Chat Completions: Extended Thinking

Thinking blocks, signatures, multi-turn reasoning, and Google Search grounding

Self-Hosted Models Overview

⌘I

Get Started

LLM Gateway

MCP Registry and Gateway

Skills Registry

Prompt Registry

Guardrails and Security

Observability

Deployment

Admin Guide

Chat

Messages

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Fine-tuning

Moderations

Models

Chat Completions API (/chat/completions)

Provider capabilities

Contents

Getting Started

Configuration

Input Controls

System Prompts

Request Parameters

​Provider capabilities

​Contents

​Getting Started

​Configuration

​Input Controls

​System Prompts

​Request Parameters

Provider capabilities

Contents

Getting Started

Configuration

Input Controls

System Prompts

Request Parameters