Skip to main content
TrueFoundry AI Gateway provides a universal API for all supported models via the standard OpenAI /chat/completions endpoint. This unified interface allows you to seamlessly work with models from different providers through a consistent API. API Reference: POST /chat/completions

Provider capabilities

The gateway maps the OpenAI Chat Completions contract to each provider. The table below summarizes feature support by provider for this endpoint.
Legend:
  • Supported by Provider and Truefoundry
  • Provided by provider, but not by Truefoundry
  • Provider does not support this feature
ProviderStreamNon StreamToolsJSON ModeSchema ModePrompt CachingReasoningStructured Output
OpenAI
Azure OpenAI
Anthropic
Bedrock
Vertex
Cohere
Gemini
Groq
AI21
Cerebras
SambaNova
Perplexity-AI
Together-AI
xAI
DeepInfra

Contents

SectionDescription
OverviewProvider capabilities, getting started, and input controls
MultimodalImages, audio, video, PDFs, and vision models
Tools & signaturesFunction calling and thought signatures
Structured outputsJSON mode, JSON schema, and Pydantic
Caching & reasoningPrompt caching and reasoning models
Extended thinkingThinking blocks, multi-turn reasoning, grounding

Getting Started

You can use the standard OpenAI client to send requests to the gateway:
from openai import OpenAI

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="{GATEWAY_BASE_URL}"
)

response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini", # this is the truefoundry model id
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)

print(response.choices[0].message.content)

Configuration

You will need to configure the following:
  1. base_url: The base URL of the TrueFoundry AI Gateway
  2. api_key: API key generated from Personal Access Tokens
  3. model: TrueFoundry model ID in the format provider_account/model_name (available in the LLM playground UI)
See Integrate with code for instructions on obtaining these values. For using native provider SDKs (OpenAI, Google Gen AI, Anthropic, boto3), see Native SDK Support.

Input Controls

System Prompts

System prompts set the behavior and context for the model by defining the assistant’s role, tone, and constraints:
response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that specializes in Python programming."},
        {"role": "user", "content": "How do I write a function to calculate factorial?"}
    ]
)

Request Parameters

Fine-tune model behavior with these common parameters:
response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    temperature=0.7,       # Controls randomness (0.0 to 1.0)
    max_tokens=100,        # Maximum tokens to generate
    verbosity="high",      # Constrains verbosity: low, medium, high
    top_p=0.9,             # Nucleus sampling parameter
    frequency_penalty=0.0, # Reduces repetition
    presence_penalty=0.0,  # Encourages new topics
    stop=["\n", "Human:"]  # Stop sequences
)
Some models don’t support all parameters. For example, temperature is not supported by o series models like o3-mini.