Working with Multi Modal
The API supports various media types including images, audio, video and pdf.Images
Images
Media Resolution
Media Resolution
Supported Providers:
OpenAI, Azure OpenAI, Google Gemini, Google Vertex AI, xAIThe detail parameter in the image_url object allows you to control the resolution at which images are processed. This helps balance between response quality, latency, and cost.Supported Values: low, high, autoExample Usage
For Google Gemini and Vertex AI providers, the
detail parameter is automatically translated to the mediaResolution parameter:"low"→MEDIA_RESOLUTION_LOW(64 tokens)"high"→MEDIA_RESOLUTION_HIGH(256+ tokens with scaling)"auto"or omitted → No explicit media resolution (model decides)
Audio
Audio
Video
Video
PDF Documents
PDF Documents
Supported Providers: OpenAI, Bedrock, Anthropic, Google Vertex, Google GeminiPDF document processing allows models to analyze and extract information from PDF files:
Using Base64 Encoded PDF
Vision
TrueFoundry supports vision models from all integrated providers as they become available. These models can analyze and interpret images alongside text, enabling multimodal AI applications.| Provider | Models |
|---|---|
| OpenAI | gpt-4-vision-preview, gpt-4o, gpt-4o-mini |
| Anthropic | claude-3-sonnet, claude-3-haiku, claude-3-opus, claude-3.5-sonnet, claude-3.5-haiku, claude-4-oppus, claude-4-sonnet, claude-3-7-sonnet |
| Gemini | gemini-1.0-pro-vision, gemini-1.5-flash, gemini-1.5-flash-8b, gemini-1.5-pro, gemini-2.5-pro, gemini-2.5-flash |
| AWS Bedrock | anthropic.claude-3-5-sonnet, anthropic.claude-3-5-haiku, anthropic.claude-3-5-sonnet-20240620-v1:0 |
| Azure OpenAI | gpt-4-vision-preview, gpt-4o, gpt-4o-mini |
| xAI | grok-2-vision-1212 |