API Access to Model Metrics

The Gateway Model Metrics Query API provides a flexible way to query Gateway model and virtual-model metrics for usage, performance, cost, and user activity. You can retrieve either distribution (aggregated) or timeseries results with powerful filtering and grouping.

This page covers datasource: "modelMetrics". For other datasources, see the sibling pages for MCP, Guardrail, Cache, Routing, and Agent metrics.

Access control

Tenant admins: Can query metrics for the entire organization (tenant-wide).
Users: Can query their own data and their teams’ data.
Virtual accounts: Can query their own data and their teams’ data; with tenant-admin permissions, they can access tenant-wide data.

The server applies RBAC automatically; callers don’t pass any RBAC fields.

Section	Description
Overview	Authentication, quick start, and API reference
Filtering	Filter operators, fields, and combinations
Distribution examples	Aggregated (distribution) query examples
Timeseries examples	Time-bucketed (timeseries) query examples
Response format	Response JSON structure and error responses

Authentication

You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token (PAT) or Virtual Account Token (VAT).

Get your API key

To generate an API key:

Personal Access Token (PAT): Go to Access → Personal Access Tokens in your TrueFoundry dashboard
Virtual Account Token (VAT): Go to Access → Virtual Account Tokens (requires admin permissions)

For detailed authentication setup, see our Authentication guide.

Quick Start

By default, the API returns metrics for both models and virtual models. To restrict to one, add {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true} for model-only metrics, or value: false for virtual-model-only metrics.

The virtual-model column has two aliases. In groupBy and aggregations[].column use virtualModel. In filters[].fieldName and in response keys, the name is virtualModelName. They refer to the same underlying database column.

Distribution query

Aggregated model metrics including request counts, token totals, p99 latency, and cost grouped by model:

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2026-04-21T00:00:00.000Z",
        "endTs": "2026-04-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [
            {"type": "count", "column": "modelName"},
            {"type": "sum", "column": "inputTokens"},
            {"type": "sum", "column": "outputTokens"},
            {"type": "p99", "column": "latencyMs"},
            {"type": "sum", "column": "costInUSD"}
        ],
        "groupBy": ["modelName"],
        "filters": [
            {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
        ]
    }
)

print(response.json())

Timeseries query

The same shape bucketed hourly:

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2026-04-21T00:00:00.000Z",
        "endTs": "2026-04-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "interval": "1 hour",
        "aggregations": [
            {"type": "count", "column": "modelName"},
            {"type": "sum", "column": "inputTokens"},
            {"type": "p99", "column": "latencyMs"}
        ],
        "groupBy": ["modelName"],
        "filters": [
            {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
        ]
    }
)

print(response.json())

API reference

Endpoint

POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query

Post JSON to this endpoint with Authorization: Bearer <your_api_key> and Content-Type: application/json.

Request parameters

string

required

ISO 8601 timestamp marking the inclusive lower bound of the query window (e.g. "2026-04-21T00:00:00.000Z").

string

required

ISO 8601 timestamp marking the exclusive upper bound of the query window (e.g. "2026-04-22T00:00:00.000Z").

string

required

The data source to query. Use "modelMetrics" for Gateway model metrics.

string

required

The type of query to execute:

"distribution": returns aggregated rows (one row per groupBy combination).
"timeseries": returns time-bucketed rows (one row per bucket per groupBy combination). Requires interval.

array

Array of { type, column } objects describing the aggregations to compute. When omitted, only the implicit total = COUNT(*) is returned.

"aggregations": [
    {"type": "count", "column": "modelName"},
    {"type": "sum", "column": "inputTokens"},
    {"type": "p99", "column": "latencyMs"}
]

Supported aggregation types

Type	Description
`sum`	Sum of values
`count`	Non-null count of the column
`countDistinct`	Distinct count
`min`	Minimum value
`max`	Maximum value
`avg`	Average
`p5`, `p10`, `p25`, `p50`, `p75`, `p90`, `p95`, `p99`, `p999`	Percentiles (approximate)
`rateSum`	`sum` normalised by the interval in seconds (timeseries only)
`rateAvg`	`avg` normalised by the interval in seconds (timeseries only)
`rateMin`	`min` normalised by the interval in seconds (timeseries only)
`rateMax`	`max` normalised by the interval in seconds (timeseries only)
`ratePerMinute`	Value divided by the interval in minutes (timeseries only)

Supported aggregation columns

Column	Notes
`costInUSD`	Cost incurred (USD)
`inputTokens`	Number of input tokens
`outputTokens`	Number of output tokens
`latencyMs`	Total request latency (ms)
`timeToFirstTokenMs`	Time to the first generated token (ms)
`interTokenLatencyMs`	Latency between consecutive generated tokens (ms)
`timePerOutputTokenLatencyMs`	Latency per output token (ms)

All scalar and percentile aggregation types apply to every column above.

array

Array of field names to group results by. Custom metadata keys are supported with a metadata. prefix (e.g. "metadata.environment").

"groupBy": ["modelName", "team", "metadata.environment"]

Available group-by fields

Field	Notes
`modelName`	The underlying model name
`virtualModel`	The virtual-model name (when the request was routed through one)
`requestType`	Type of request, e.g. `ChatCompletion`, `Embedding`
`providerModelName`	Underlying provider model name
`providerAccountType`	Account type of the provider (e.g. `model`, `mcp-server`, `guardrail-config`)
`errorCode`	HTTP error code returned, when applicable
`userEmail`	Group by user (response key: `createdBySubjectSlug`)
`virtualaccount`	Group by virtual account (response key: `createdBySubjectSlug`)
`team`	Unnests the `Teams` array
`createdBySubjectType`	Distinguishes `user` vs `virtualaccount`
`metadata.<key>`	Group by a custom metadata key

When groupBy contains userEmail (without virtualaccount), the server auto-injects WHERE CreatedBySubjectType = 'user'. virtualaccount alone auto-injects 'virtualaccount'. When both appear, scope it yourself with createdBySubjectType if needed.

array

Array of filter objects, AND-combined. See Filtering for the full operator reference and the per-field allow-list.

string

Required for timeseries queries. Bucket size as <positive integer> <unit>, where <unit> is one of second, minute, hour, day, week, month, year (with or without a trailing s). Examples: "30 second", "5 minute", "1 hour", "1 day". Compound expressions like "1 hour 30 minute" are rejected.

number

deprecated

Deprecated alias for interval. Accepts a positive integer number of seconds (e.g. 3600 for hourly). Prefer interval in new code. If both are provided, interval wins.

Get Started

LLM Gateway

MCP Registry and Gateway

Skills Registry

Prompt Registry

Guardrails and Security

Observability

Deployment

Admin Guide

Chat

Messages

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Fine-tuning

Moderations

Models

Access control

Contents

Authentication

Quick Start

Distribution query

Timeseries query

API reference

Endpoint

Request parameters

Query Examples

​Access control

​Contents

​Authentication

​Quick Start

​Distribution query

​Timeseries query

​API reference

​Endpoint

​Request parameters

Query Examples

Access control

Contents

Authentication

Quick Start

Distribution query

Timeseries query

API reference

Endpoint

Request parameters