- Performance Metrics: Track key latency metrics like Request Latency, Time to First Token (TTFT), Inter-Token Latency (ITL), and Time Per Output Token (TPOT) with P50, P75, P90, and P99 percentiles.
- Cost and Token Usage: Gain visibility into your application’s costs with detailed breakdowns of input/output tokens and the associated expenses for each model.
- Usage Patterns: Understand how your application is being used with detailed analytics on user activity, model distribution, virtual account allocation, and team-based usage.
- MCP Observability: Monitor Model Context Protocol servers and tools with dedicated metrics for request rates, latency, failure rates, and method-level breakdowns.
- Guardrail Effectiveness: See how often your content safety guardrails are blocking, mutating, or flagging requests — for both model inputs and outputs.
- Routing and Policy Impact: Evaluate the effectiveness of your routing rules, rate limits, and budget limits by monitoring how often they are triggered and how traffic is distributed.
- Cache Performance: Measure the ROI of your semantic cache with hit rates, cost savings, and latency overhead.
- Error Analysis: Quickly identify and diagnose issues with a view of error rates and error code breakdowns for both LLM and MCP traffic.
Dashboard Tabs
The dashboard is organized into tabs, each designed to provide a specific perspective on your data. Every tab supports time-range selection, filters (by model, user, virtual account, team, and more), and a Refresh button for live monitoring.Overview
High-level gateway health — total cost, LLM and MCP call volumes, error breakdowns, top models, providers, users, and tools at a glance.
Model Metrics
Deep dive into LLM performance — latency percentiles, throughput, failure rates, cost, and token usage. Pivot by model, user, team, virtual account, or custom metadata.
MCP Metrics
Monitor MCP servers and tools — request rates, latency, failure breakdowns, and method-level call distributions.
Guardrail Metrics
Track guardrail evaluations — blocked, mutated, and flagged request rates, per-guardrail results for inputs and outputs, and latency overhead.
Routing Metrics
Understand routing rules, rate limits, and budget limits — how often they trigger, how traffic is distributed, and where limits are being hit.
Cache Metrics
Measure semantic cache effectiveness — hit rates, cost savings, cache errors, and lookup latency.
Filtering and Drill-Down
The dashboard includes filters that allow you to narrow down your analysis to specific models, users, virtual accounts, teams, MCP servers, tools, or custom metadata fields. Filters persist across tabs, making it easy to investigate a specific user or model across all dimensions.
Exporting Data
You can download aggregated metrics data in CSV format by clicking the export icon on supported tabs. Choose which dimensions to group the data by and optionally include custom metadata keys. You can also fetch the data via API for programmatic access.