Chat Completions: Multimodal

Images

Supported Providers: OpenAI, Bedrock, Anthropic, Google Vertex, Google GeminiSend images as part of your chat completion requests using either URLs or base64 encoding:

Using Image URLs

from openai import OpenAI

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="{GATEWAY_BASE_URL}"
)

response = client.chat.completions.create(
    model="openai-main/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ]
)

Using Base64 Encoded Images

import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

response = client.chat.completions.create(
    model="openai-main/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encode_image('image.jpeg')}"
                    }
                }
            ]
        }
    ]
)

Media Resolution

Supported Providers: OpenAI, Azure OpenAI, Google Gemini, Google Vertex AI, xAIThe detail parameter in the image_url object allows you to control the resolution at which images are processed. This helps balance between response quality, latency, and cost.Supported Values: low, high, auto

Example Usage

import base64

from openai import OpenAI

API_KEY = "your_truefoundry_api_key"
BASE_URL = "{GATEWAY_BASE_URL}"

# Read and encode the image as base64
with open("test-img.png", "rb") as image_file:
    base64_image = base64.b64encode(image_file.read()).decode('utf-8')

client = OpenAI(
    api_key=API_KEY,
    base_url=BASE_URL
)

response = client.chat.completions.create(
    model="test-123/gemini-3-pro-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{base64_image}",
                        "detail": "low"  # Options: "low", "high", "auto"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message)

For Google Gemini and Vertex AI providers, the detail parameter is automatically translated to the mediaResolution parameter:

"low" → MEDIA_RESOLUTION_LOW (64 tokens)
"high" → MEDIA_RESOLUTION_HIGH (256+ tokens with scaling)
"auto" or omitted → No explicit media resolution (model decides)

Audio

Supported Models: Google Gemini models (Gemini 2.0 Flash, etc.)Send audio files in supported formats (MP3, WAV, etc.). Currently supported for Google Gemini models:

Using Audio URLs

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Transcribe this audio"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/audio.wav",
                        "mime_type": "audio/wav" # required for gemini models
                    }
                }
            ]
        }
    ]
)

Using Base64 Encoded Audio

import base64

def encode_audio(audio_path):
    with open(audio_path, "rb") as audio_file:
        return base64.b64encode(audio_file.read()).decode('utf-8')

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Transcribe this audio"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:audio/wav;base64,{encode_audio('audio.wav')}"
                    }
                }
            ]
        }
    ]
)

Video

Supported Models: Google Gemini models (Gemini 2.0 Flash, etc.)Video processing is natively supported for Google Gemini models:

Using Video URLs

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe what's happening in this video"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://www.youtube.com/watch?v=example",
                        "mime_type": "video/mp4" # required for gemini models
                    }
                }
            ]
        }
    ]
)

Using Base64 Encoded Video

import base64

def encode_video(video_path):
    with open(video_path, "rb") as video_file:
        return base64.b64encode(video_file.read()).decode('utf-8')

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe what's happening in this video"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:video/mp4;base64,{encode_video('video.mp4')}",
                        "mime_type": "video/mp4" # required for gemini models
                    }
                }
            ]
        }
    ]
)

PDF Documents

Supported Providers: OpenAI, Bedrock, Anthropic, Google Vertex, Google GeminiPDF document processing allows models to analyze and extract information from PDF files:

Using Base64 Encoded PDF

from openai import OpenAI

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="{GATEWAY_BASE_URL}"
)

import base64

with open("sample.pdf", "rb") as file_data:
    base64_image = base64.b64encode(image_file.read()).decode('utf-8')

response = client.chat.completions.create(
    model="tfy-ai-anthropic/claude-4-sonnet-20250514",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "what's the data in the file"},
                {
                    "type": "file",
                    "file": {
                        "filename": "sample.pdf",
                        "file_data": f"data:application/pdf;base64,{file_data}",
                    }
                },
            ]
        }
    ]
)

print(response.choices[0].message.content)

Get Started

LLM Gateway

MCP Registry and Gateway

Skills Registry

Prompt Registry

Guardrails and Security

Observability

Deployment

Admin Guide

Chat

Messages

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Fine-tuning

Moderations

Models

Using Image URLs

Using Base64 Encoded Images

Example Usage

Using Audio URLs

Using Base64 Encoded Audio

Using Video URLs

Using Base64 Encoded Video

Using Base64 Encoded PDF

Vision

Using Vision Models with OpenAI SDK

Provider	Models
OpenAI	`gpt-4-vision-preview, gpt-4o, gpt-4o-mini`
Anthropic	`claude-3-sonnet, claude-3-haiku, claude-3-opus, claude-3.5-sonnet, claude-3.5-haiku, claude-4-oppus, claude-4-sonnet, claude-3-7-sonnet`
Gemini	`gemini-1.0-pro-vision, gemini-1.5-flash, gemini-1.5-flash-8b, gemini-1.5-pro, gemini-2.5-pro, gemini-2.5-flash`
AWS Bedrock	`anthropic.claude-3-5-sonnet, anthropic.claude-3-5-haiku, anthropic.claude-3-5-sonnet-20240620-v1:0`
Azure OpenAI	`gpt-4-vision-preview, gpt-4o, gpt-4o-mini`
xAI	`grok-2-vision-1212`

​Working with Multi Modal

​Using Image URLs

​Using Base64 Encoded Images

​Example Usage

​Using Audio URLs

​Using Base64 Encoded Audio

​Using Video URLs

​Using Base64 Encoded Video

​Using Base64 Encoded PDF

​Vision

​Using Vision Models with OpenAI SDK

Working with Multi Modal

Using Image URLs

Using Base64 Encoded Images

Example Usage

Using Audio URLs

Using Base64 Encoded Audio

Using Video URLs

Using Base64 Encoded Video

Using Base64 Encoded PDF

Vision

Using Vision Models with OpenAI SDK