Skip to main content

FAQs

You can add multiple gateway planes to the control plane by following the steps below:
1

Create Kubernetes Secret for License Key and DB Credentials

We will create two secrets in this step:
  1. Store the License Key
  2. Store the Image Pull Secret
We need to create a Kubernetes secret containing the licence key.
Same license key will be used for all the gateway planes as used for the control plane
truefoundry-creds.yaml
apiVersion: v1
kind: Secret
metadata:
  name: truefoundry-creds
type: Opaque
stringData:
  TFY_API_KEY: <TFY_API_KEY>
Apply the secret to the Kubernetes cluster (Assuming you are installing the control plane in the truefoundry namespace)
kubectl apply -f truefoundry-creds.yaml -n truefoundry
We need to create a Image Pull Secret to enable pulling the truefoundry images from the private registry.
Same image pull secret will be used for all the gateway planes as used for the control plane. Use your credentials if you are pulling TrueFoundry images from your registry.
truefoundry-image-pull-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: truefoundry-image-pull-secret
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: <IMAGE_PULL_SECRET> # Provided by TrueFoundry team
Apply the secret to the Kubernetes cluster (Assuming you are installing the control plane in the truefoundry namespace)
kubectl apply -f truefoundry-image-pull-secret.yaml -n truefoundry
2

Create Helm chart Values file for gateway plane

Create a values file as given below and replace the following values:
  • CONTROL_PLANE_URL: URL that you will map to the control plane dashboard.
  • TENANT_NAME: Tenant name provided by TrueFoundry team.
  • GATEWAY_ENDPOINT_HOST: The domain where you will expose the gateway endpoint (e.g., gateway.example.com)
truefoundry-gateway-values.yaml
global:
  # This is the reference to the secrets we created in the previous step
  imagePullSecrets:
    - name: "truefoundry-image-pull-secret"

  # Choose the resource tier as per your needs
  resourceTier: medium # or small or large
  controlPlaneURL: <CONTROL_PLANE_URL> # eg. https://example-company.truefoundry.cloud
  tenantName: <TENANT_NAME>

ingress:
  enabled: true
  annotations: {}
  ingressClassName: nginx
  tls: []
  hosts:
    - <GATEWAY_ENDPOINT_HOST>

# Optional: Istio configuration (if using Istio instead of standard ingress)
# istio:
#   virtualservice:
#     hosts:
#       - <GATEWAY_ENDPOINT_HOST>
#     enabled: true
#     retries:
#       enabled: true
#       retryOn: gateway-error
#     gateways:
#       - istio-system/tfy-wildcard
#     annotations: {}
3

Install Helm chart for gateway plane

helm upgrade --install tfy-llm-gateway oci://tfy.jfrog.io/tfy-helm/tfy-llm-gateway -n truefoundry --create-namespace -f truefoundry-gateway-values.yaml
Yes. You can configure your Artifactory to mirror our registry.
Credentials for accessing the TrueFoundry private registry are required and will be provided during onboarding.
1. Registry Configuration
  • URL: https://tfy.jfrog.io/
2. Update Helm values
global:
  image:
    registry: <YOUR_REGISTRY> # Replace with your registry
postgresql:
  image:
    registry: <YOUR_REGISTRY> # Replace with your registry, use this if `devMode` is enabled
Yes. We provide a script that uses the truefoundry Helm Chart to identify and copy required images to your private registry.
Credentials for accessing the TrueFoundry private registry are required and will be provided during onboarding.
1. Install required dependencies
  • Skopeo
    • Used to perform the image copy operation.
  • Helm
    • Used to get the list of images from the TrueFoundry Helm Chart.
2. Add TrueFoundry Helm Chart repository
helm repo add truefoundry https://truefoundry.github.io/infra-charts
helm repo update
3. Authenticate to the TrueFoundry source registry
skopeo login -u <USERNAME> -p <PASSWORD> https://tfy.jfrog.io/
Replace <USERNAME> with the TrueFoundry registry username.
Replace <PASSWORD> with the TrueFoundry registry password.
4. Authenticate to your destination registry
skopeo login -u <USERNAME> -p <PASSWORD> <YOUR_REGISTRY>
Replace <USERNAME> with your registry username.
Replace <PASSWORD> with your registry password.
Replace <YOUR_REGISTRY> with the URL of your registry.
Skopeo will use authentication details for a registry that was previously authenticated with docker login.Alternatively, you can use the --dest-user and --dest-password flags to provide the username and password for the destination registry.
5. Run Clone Image Script
export TRUEFOUNDRY_HELM_CHART_VERSION=<TRUEFOUNDRY_HELM_CHART_VERSION>
export TRUEFOUNDRY_HELM_VALUES_FILE=<TRUEFOUNDRY_HELM_VALUES_FILE>
export DEST_REGISTRY=<YOUR_DESTINATION_REGISTRY>

# Dry-run example
curl -s https://raw.githubusercontent.com/truefoundry/infra-charts/main/scripts/clone_images_to_your_registry.sh | bash -s -- --helm-chart truefoundry --helm-version $TRUEFOUNDRY_HELM_CHART_VERSION --helm-values $TRUEFOUNDRY_HELM_VALUES_FILE --dest-registry $DEST_REGISTRY --dry-run

# Live example
curl -s https://raw.githubusercontent.com/truefoundry/infra-charts/main/scripts/clone_images_to_your_registry.sh | bash -s -- --helm-chart truefoundry --helm-version $TRUEFOUNDRY_HELM_CHART_VERSION --helm-values $TRUEFOUNDRY_HELM_VALUES_FILE --dest-registry $DEST_REGISTRY
Replace <TRUEFOUNDRY_HELM_CHART_VERSION> with the version of the Truefoundry helm chart you want to use. You can find the latest version in the changelog.Replace <TRUEFOUNDRY_HELM_VALUES_FILE> with the path to the values file you created in the Installation Instructions.Replace <DEST_REGISTRY> with the URL of your registry.
6. Update the Helm values file to use your registry
global:
  image:
    registry: <YOUR_REGISTRY> # Replace with your registry
postgresql:
  image:
    registry: <YOUR_REGISTRY> # Replace with your registry, use this if `devMode` is enabled
An air-gapped environment is isolated from the internet. Since the control plane and gateway plane ship as a single helm chart (truefoundry), you only need to make the container images available in your private registry and update the helm values to point to it.
  1. Copy images to your private registry — set up a registry mirror or copy images directly using the steps described in the FAQs above
  2. Update helm values to point to your private registry (see the helm value overrides in the same FAQs above)
  3. Continue with the standard installation on the overview and choose your cloud install guide (AWS, GCP, Azure, or on-prem)
You can integrate with AWS bedrock models from a different AWS account by following the steps below:
  1. Add the following IAM policy to the control plane IAM role so that it can assume the IAM role of the AWS account that has the bedrock models:
{
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Effect": "Allow",
      "Resource": "*"
    }
  ],
  "Version": "2012-10-17"
}
  1. In the IAM role in the destination AWS account (which has bedrock access), add the following trust policy to allow the control plane IAM role to assume it:
{
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "<CONTROL_PLANE_IAM_ROLE_ARN>"
      },

      "Action": "sts:AssumeRole"
    }
  ],
  "Version": "2012-10-17"
}
  1. Now you can use the IAM role of the destination AWS account while integrating AWS bedrock models in the TrueFoundry AI gateway.
No, we only need block storage for installing and running Truefoundry. This should be supported via the CSI driver and only ReadWriteOnce access is required.
We log access information in standard output with the following format:
  1. logfmt
  2. json
These can be switched with the help of an environment variable to the AI Gateway installation. (Default: logfmt)

Log format

Standard log format structure:
time="%START_TIME%" level=%LEVEL% ip=%IP_ADDRESS% tenant=%TENANT_NAME% user=%SUBJECT_TYPE%:%SUBJECT_SLUG% model=%MODEL_ID% method=%METHOD% path=%PATH% status=%STATUS_CODE% time_taken=%DURATION%ms trace_id=%TRACE_ID%
Log operatorDetails
START_TIMEISO timestamp for request start. eg. 2025-08-12 13:34:50
LEVELinfo|warn|error
IP_ADDRESSIP address of the caller. eg. ::ffff:10.99.55.142
TENANT_NAMEName of the tenant. eg. truefoundry
SUBJECT_TYPEuser|virtualaccount
SUBJECT_SLUGEmail or virtual account name. eg. tfy-user@truefoundry.com|demo-virtualaccount
MODEL_IDModel ID. eg. openai-default/gpt-5
METHODGET|POST|PUT
PATHPath of the request. eg. /api/inference/openai/chat/completions
STATUS_CODE200|400|401|403|429|500
DURATIONDuration of the request. eg. 12
TRACE_IDTrace ID of the request
Examples:
time="2025-08-12 13:34:50" level=info ip=::ffff:10.99.55.142 tenant=truefoundry user=virtualaccount:demo-virtualaccount model=openai-default/gpt-5 method=POST path=/api/inference/openai/chat/completions status=200 time_taken=53ms trace_id=587b2a946c13f62f9160674a8c983ce3
By default, the control plane uses the TrueFoundry Auth Server for user authentication. However, you can configure it to use your own external identity provider instead. We support both OIDC and SAML-compliant identity providers. Read more
If your LLM requests are timing out after a certain duration, the first thing to check is the traces in the TrueFoundry dashboard. Look at the request duration — if you see requests consistently timing out at exactly 60 seconds, the issue is almost certainly the load balancer, not the TrueFoundry AI Gateway. The TrueFoundry gateway does not impose any request timeout.Traces showing requests timing out at 60 secondsThis commonly happens when an Application Load Balancer (ALB) is placed in front of the gateway to expose it. The default Connection idle timeout on AWS ALBs is 60 seconds, which is too short for long-running LLM inference requests (especially streaming responses or large prompts).Solution: Increase the idle timeout on your AWS ALB to a higher value (e.g., 300 seconds or more).You can find this setting in the AWS Console under EC2 → Load Balancers → Select your ALB → Attributes tab → Connection idle timeout.AWS ALB Connection idle timeout settingYou can also update it via the AWS CLI:
aws elbv2 modify-load-balancer-attributes \
  --load-balancer-arn <YOUR_ALB_ARN> \
  --attributes Key=idle_timeout.timeout_seconds,Value=300
If you are using an ingress controller (e.g., NGINX Ingress) in addition to the ALB, also verify that the ingress controller’s proxy timeout settings are configured appropriately.
Yes. TrueFoundry supports exporting metrics to Victoria Metrics as an alternative to Prometheus. To enable this, add the following to your truefoundry-values.yaml file and upgrade the Helm release:
This only installs the VMServiceScrape and related custom resources for scraping TrueFoundry metrics. It does not deploy Victoria Metrics itself — you are responsible for installing and managing your own Victoria Metrics instance.
truefoundry-values.yaml
victoriaMetricsMonitoring:
  enabled: true
Then upgrade the Helm release to apply the changes:
helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry -n truefoundry --create-namespace -f truefoundry-values.yaml
The TrueFoundry control plane supports SSL connections to PostgreSQL. You can configure SSL by setting the DB_SSL_MODE environment variable in your truefoundry-values.yaml.Supported DB_SSL_MODE values:
ModeEncryptionCertificate ValidationUse Case
disableNoNoLocal development or trusted networks
no-verifyYesNoManaged databases with self-signed or unverified certs
requireYesYes (system CA store)When you have a valid CA certificate and want full verification
verify-caYesYes (custom CA)Same as require but explicitly checks CA
verify-fullYesYes (CA + hostname)Strictest mode, validates CA and hostname
SSL certificate environment variables:
VariablePurposeRequired
DB_SSL_CA_PATHPath to the server CA certificate fileFor require, verify-ca, or verify-full modes
DB_SSL_CERT_PATHPath to the client certificate file (for mTLS)Only for mTLS (GCP Cloud SQL, Azure Database for PostgreSQL)
DB_SSL_KEY_PATHPath to the client private key file (for mTLS)Only for mTLS (GCP Cloud SQL, Azure Database for PostgreSQL)
The certificate requirements vary by cloud provider. AWS RDS only needs the server CA bundle (DB_SSL_CA_PATH), while GCP Cloud SQL and Azure Database for PostgreSQL may require all three certificate paths when client certificate authentication (mTLS) is enabled. Refer to the cloud-specific control plane documentation for detailed examples.
Scenario 1: Encrypted connection without certificate validation (no-verify)This is the simplest option for managed databases. It encrypts the connection but skips server certificate validation.
truefoundry-values.yaml
servicefoundryServer:
  env:
    DB_SSL_MODE: "no-verify"
mlfoundryServer:
  env:
    DB_SSL_MODE: "no-verify"
Scenario 2: Encrypted connection with certificate validation (require)This mode encrypts the connection and validates the server certificate. You must provide the appropriate certificate files for your database provider. The example below shows the full configuration with all three certificate paths (for GCP/Azure mTLS). For AWS RDS, only DB_SSL_CA_PATH is needed.Create a Kubernetes Secret containing your certificate files:
# AWS RDS (CA bundle only)
kubectl create secret generic db-ssl-certs \
  --from-file=ca-certificate.crt=/path/to/your/ca-certificate.crt \
  -n truefoundry

# GCP Cloud SQL / Azure (full mTLS)
kubectl create secret generic db-ssl-certs \
  --from-file=ca-certificate.crt=/path/to/server-ca.pem \
  --from-file=client-cert.pem=/path/to/client-cert.pem \
  --from-file=client-key.pem=/path/to/client-key.pem \
  -n truefoundry
Then configure truefoundry-values.yaml to mount the certificates and set the SSL paths:
truefoundry-values.yaml
servicefoundryServer:
  env:
    DB_SSL_MODE: "require"
    DB_SSL_CA_PATH: "/etc/ssl/custom/ca-certificate.crt"
    # Only needed for mTLS (GCP Cloud SQL, Azure Database for PostgreSQL)
    DB_SSL_CERT_PATH: "/etc/ssl/custom/client-cert.pem"
    DB_SSL_KEY_PATH: "/etc/ssl/custom/client-key.pem"
  extraVolumes:
    - name: db-ssl-certs
      secret:
        secretName: db-ssl-certs
  extraVolumeMounts:
    - name: db-ssl-certs
      mountPath: /etc/ssl/custom
      readOnly: true
mlfoundryServer:
  env:
    DB_SSL_MODE: "require"
    DB_SSL_CA_PATH: "/etc/ssl/custom/ca-certificate.crt"
    # Only needed for mTLS (GCP Cloud SQL, Azure Database for PostgreSQL)
    DB_SSL_CERT_PATH: "/etc/ssl/custom/client-cert.pem"
    DB_SSL_KEY_PATH: "/etc/ssl/custom/client-key.pem"
  extraVolumes:
    - name: db-ssl-certs
      secret:
        secretName: db-ssl-certs
  extraVolumeMounts:
    - name: db-ssl-certs
      mountPath: /etc/ssl/custom
      readOnly: true
Upgrade the Helm release to apply the changes:
helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry -n truefoundry --create-namespace -f truefoundry-values.yaml
If your TrueFoundry deployment needs to trust custom Certificate Authorities (e.g., for internal services, private registries, or corporate proxies), you can configure custom CA certificates in the Helm chart.There are two methods to provide custom CA certificates:

Method 1: Pass customCA as a multiline string

You can directly provide the CA certificate content as a multiline string in your values.yaml:
truefoundry-values.yaml
global:
  customCA:
    enabled: true
    certificate: |
      -----BEGIN CERTIFICATE-----
      MIIDXTCCAkWgAwIBAgIJAKZ7VqHEqvmKMA0GCSqGSIb3DQEBCwUAMEUxCzAJBgNV
      BAYTAkFVMRMwEQYDVQQIDApTb21lLVN0YXRlMSEwHwYDVQQKDBhJbnRlcm5ldCBX
      ... (rest of your certificate) ...
      -----END CERTIFICATE-----
This method is suitable when you have one or a few CA certificates to add.

Method 2: Use an existing ConfigMap containing CA certificate(s)

If you already have your custom CA certificates in a Kubernetes ConfigMap, you can reference it directly. An initContainer will merge the custom CA with the system CAs.
1

Create a ConfigMap with your custom CA certificate(s)

Create a Kubernetes ConfigMap containing your custom CA certificate(s):
kubectl create configmap custom-ca-certificates \
  --from-file=ca-certificates.crt=custom-ca.crt \
  -n truefoundry
Alternatively, if you want to create it from a YAML file:
custom-ca-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-ca-certificates
  namespace: truefoundry
data:
  ca-certificates.crt: |
    -----BEGIN CERTIFICATE-----
    ... (your custom CA certificate content) ...
    -----END CERTIFICATE-----
Apply the ConfigMap:
kubectl apply -f custom-ca-configmap.yaml
2

Reference the ConfigMap in your Helm values

Update your truefoundry-values.yaml to reference the ConfigMap:
truefoundry-values.yaml
global:
  customCA:
    enabled: true
    existingConfigMap:
      name: custom-ca-certificates
3

Upgrade the Helm installation

Apply the changes by upgrading your Helm release:
helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
  -n truefoundry --create-namespace -f truefoundry-values.yaml

Method 2b: Use an existing ConfigMap with overrideCAList

If you want the ConfigMap to replace the system CA bundle entirely instead of merging, set overrideCAList to true. In this mode, the ConfigMap is mounted directly at /etc/ssl/certs/ (no initContainer is used), so the ConfigMap must contain the full CA bundle (system + custom CAs).
1

Prepare your CA certificate file

Add your custom CA certificate(s) to your system’s CA bundle. On a Linux system with the certificate file saved as custom-ca.crt:
# Copy the certificate to the CA directory
sudo cp custom-ca.crt /usr/local/share/ca-certificates/

# Update the CA certificates bundle
sudo update-ca-certificates
This will generate or update /etc/ssl/certs/ca-certificates.crt with your custom CA included (system CAs + your custom CA).
2

Create a ConfigMap from the complete ca-certificates.crt file

Create a Kubernetes ConfigMap containing the complete CA bundle:
kubectl create configmap custom-ca-certificates \
  --from-file=ca-certificates.crt=/etc/ssl/certs/ca-certificates.crt \
  -n truefoundry
Alternatively, if you want to create it from a YAML file:
custom-ca-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-ca-certificates
  namespace: truefoundry
data:
  ca-certificates.crt: |
    -----BEGIN CERTIFICATE-----
    ... (your complete ca-certificates.crt content including system + custom CAs) ...
    -----END CERTIFICATE-----
Apply the ConfigMap:
kubectl apply -f custom-ca-configmap.yaml
3

Reference the ConfigMap in your Helm values with overrideCAList

Update your truefoundry-values.yaml to reference the ConfigMap with overrideCAList enabled:
truefoundry-values.yaml
global:
  customCA:
    enabled: true
    existingConfigMap:
      name: custom-ca-certificates
      overrideCAList: true
4

Upgrade the Helm installation

Apply the changes by upgrading your Helm release:
helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
  -n truefoundry --create-namespace -f truefoundry-values.yaml
When overrideCAList is set to true, the ConfigMap is mounted directly replacing the system CA bundle. Your ConfigMap must contain the complete CA bundle (system CAs + your custom CAs). If you only include your custom CAs, all standard public CA trust will be lost and outbound HTTPS connections to public services will fail.
The custom CA certificates will be mounted into all TrueFoundry pods and added to the system’s trust store. This ensures that all outgoing HTTPS connections from TrueFoundry services will trust your custom CAs.
After adding custom CA certificates, verify that your TrueFoundry pods have restarted and are running correctly. You may need to restart existing pods for the changes to take effect.
By default, TLS is terminated at your ingress controller or load balancer, and traffic reaches the TrueFoundry proxy (Caddy) over plain HTTP inside the cluster.In-pod TLS termination moves that step into the proxy container: Caddy terminates HTTPS using a certificate you provide, then forwards to the application over loopback HTTP. This is useful when you want the same certificate inside the pod.
PlaneHelm chartValues pathCaddy listener
Control planetruefoundryglobal.proxy.tls:8080 on tfy-proxy
Gatewaytfy-llm-gatewayproxy.tls:8081 on the gateway proxy sidecar (app stays on :8787)
Do not terminate TLS at the ingress and inside the pod for the same hostname. Pick one layer:
  • In-pod termination (this guide): ingress must pass through encrypted traffic (for example, NGINX ssl-passthrough). Do not attach a TLS certificate on the Ingress resource for that host.
  • Ingress termination (default): leave global.proxy.tls.enabled / proxy.tls.enabled as false and configure TLS on the Ingress or Gateway API parent instead.

Traffic flow (in-pod termination)

Client ──HTTPS──► Ingress (TLS passthrough) ──HTTPS──► Caddy in pod ──HTTP──► App (servicefoundry / llm-gateway)

Prerequisites

  1. A Kubernetes TLS Secret in the release namespace with PEM certificate and private key (standard keys tls.crt and tls.key).
  2. An ingress controller that can forward TLS without terminating it when using Ingress (see below).
  3. For self-signed or private CAs: also configure custom CA certificates so Node.js services trust outbound HTTPS, or use in-cluster HTTP URLs for internal API calls (recommended).

Control plane (truefoundry chart)

1

Create the TLS Secret

Create a kubernetes.io/tls secret in the truefoundry namespace. Replace the paths with your certificate and key files:
kubectl create secret tls tfy-proxy-cp-tls \
  --cert=/path/to/tls.crt \
  --key=/path/to/tls.key \
  -n truefoundry
For a wildcard host such as *.primary.example.com, issue a cert that covers your control-plane hostname (for example cp.primary.example.com).
2

Enable proxy TLS in Helm values

Add the following to truefoundry-values.yaml:
truefoundry-values.yaml
global:
  proxy:
    tls:
      enabled: true
      secretName: tfy-proxy-cp-tls
      # Optional: if your Secret uses non-standard keys
      # secretKeys:
      #   cert: tls.crt
      #   key: tls.key
Upgrade the release:
helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
  -n truefoundry --create-namespace -f truefoundry-values.yaml
3

Configure ingress for TLS passthrough

When global.proxy.tls.enabled is true, Caddy expects HTTPS on the service port. Your ingress must forward the TLS connection without terminating it.ingress-nginx — enable passthrough on the controller (once per cluster) and annotate the control-plane Ingress:
truefoundry-values.yaml
global:
  ingress:
    enabled: true
    ingressClassName: nginx
    hosts:
      - cp.example.com
    # Do not set global.ingress.tls when using in-pod termination — TLS is handled inside tfy-proxy.
    annotations:
      nginx.ingress.kubernetes.io/ssl-passthrough: "true"
  proxy:
    tls:
      enabled: true
      secretName: tfy-proxy-cp-tls
The ingress-nginx controller must be installed with controller.extraArgs.enable-ssl-passthrough: "true".Istio / Gateway API — configure TLS mode PASSTHROUGH on the Gateway listener that fronts the control plane. TLS is not configured on the HTTPRoute itself.
4

Verify the control plane

kubectl -n truefoundry rollout status deploy/truefoundry-tfy-proxy
curl -vk https://cp.example.com/health
You should get a successful response over HTTPS. Check that the certificate presented to the client is the one from your Secret (not only the ingress default certificate).
5

Update gateway `CONTROL_PLANE_URL` when `tags.llmGateway` is enabled and control-plane proxy TLS is on

When global.proxy.tls.enabled is true, truefoundry-tfy-proxy listens with TLS on port 8080. In-cluster HTTP calls such as http://<release>-tfy-proxy:8080 will fail (for example ECONNRESET or certificate errors).If you deploy the gateway with the truefoundry chart (tags.llmGateway: true), override tfy-llm-gateway.env.CONTROL_PLANE_URL to your HTTPS control-plane URL (global.controlPlaneURL), not the internal http://...-tfy-proxy:8080 address:
truefoundry-values.yaml
global:
  controlPlaneURL: https://cp.example.com
  proxy:
    tls:
      enabled: true
      secretName: tfy-proxy-cp-tls
  customCA:
    enabled: true
    existingConfigMap:
      name: custom-ca-certificates

tfy-llm-gateway:
  env:
    CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}"
    PUBLIC_CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}"
The standalone tfy-llm-gateway chart already sets CONTROL_PLANE_URL from global.controlPlaneURL by default. The override above is required when tags.llmGateway is true on the truefoundry chart, because the parent chart default uses http://{{ .Release.Name }}-tfy-proxy:8080.

Gateway plane (tfy-llm-gateway chart)

Use this when deploying the gateway as its own Helm release (gateway plane only) or when overriding the tfy-llm-gateway subchart under the parent truefoundry chart.
1

Create the TLS Secret

kubectl create secret tls tfy-proxy-gateway-tls \
  --cert=/path/to/tls.crt \
  --key=/path/to/tls.key \
  -n truefoundry
2

Enable proxy TLS and ingress passthrough

Standalone gateway release (truefoundry-values.yaml for tfy-llm-gateway):
truefoundry-values.yaml
global:
  # Must be https:// when the control-plane tfy-proxy has global.proxy.tls.enabled
  controlPlaneURL: https://cp.example.com

proxy:
  tls:
    enabled: true
    secretName: tfy-proxy-gateway-tls

# env.CONTROL_PLANE_URL defaults to global.controlPlaneURL in this chart.
# Override explicitly if a parent release set it to http://...-tfy-proxy:8080:
# env:
#   CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}"

ingress:
  enabled: true
  ingressClassName: nginx
  hosts:
    - gateway.example.com
  annotations:
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
  # Do not set ingress.tls — TLS terminates inside the pod.
Gateway bundled with truefoundry (tags.llmGateway: true) — nest under tfy-llm-gateway::
truefoundry-values.yaml
tfy-llm-gateway:
  proxy:
    tls:
      enabled: true
      secretName: tfy-proxy-gateway-tls
  ingress:
    enabled: true
    annotations:
      nginx.ingress.kubernetes.io/ssl-passthrough: "true"
3

Configure environment variables for startup

The gateway loads configuration at startup over HTTP(S). Set env based on whether the control-plane proxy has in-pod TLS enabled.When global.proxy.tls.enabled is true on the control plane (same cluster), set CONTROL_PLANE_URL to the public control-plane URL. Do not use http://<release>-tfy-proxy:8080 — that port expects HTTPS:
truefoundry-values.yaml
global:
  controlPlaneURL: https://cp.example.com

tfy-llm-gateway:
  env:
    CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}"
    PUBLIC_CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}"
Add custom CA certificates if controlPlaneURL uses a private or mkcert-signed certificate.When control-plane proxy TLS is disabled (default), you can use the internal proxy URL for CONTROL_PLANE_URL if the gateway and control plane share a release:
truefoundry-values.yaml
tfy-llm-gateway:
  env:
    CONTROL_PLANE_URL: http://truefoundry-tfy-proxy:8080
    PUBLIC_CONTROL_PLANE_URL: https://cp.example.com
    SERVICEFOUNDRY_SERVER_URL: http://truefoundry-servicefoundry-server:3000
    CONTROL_PLANE_NATS_URL: http://truefoundry-tfy-nats:4222
Replace truefoundry with your Helm release name if different. SERVICEFOUNDRY_SERVER_URL is used to fetch NATS credentials (/v1/x/llm-gateway/nats-creds); pointing it at servicefoundry-server avoids TLS issues on the proxy port.
4

Verify the gateway

kubectl -n truefoundry rollout status deploy/tfy-llm-gateway
kubectl -n truefoundry get pods -l app.kubernetes.io/name=tfy-llm-gateway
# Expect 2/2 Ready when proxy.tls is enabled (gateway + proxy containers)
curl -vk https://gateway.example.com/health
If pods crash with unable to verify the first certificate when fetching NATS credentials, see the custom CA section or the internal HTTP env overrides above.
East-west vs north-south TLS: proxy.tls on the gateway sidecar secures traffic into the gateway pod from clients. On the control plane, global.proxy.tls makes port 8080 HTTPS on tfy-proxy. Gateway pods must use CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}" (plus global.customCA for private CAs), or call servicefoundry-server / tfy-nats directly over HTTP — not http://...-tfy-proxy:8080.
TrueFoundry ships with a built-in monitoring stack that includes Grafana dashboards for the control plane. To enable it, add the following to your truefoundry-values.yaml:
truefoundry-values.yaml
truefoundryMonitoring:
  enabled: true
  grafana:
    grafana.ini:
      auth.jwt:
        jwk_set_url: >-
          https://<your-truefoundry-control-plane-url>/api/svc/v1/keys/<tenant-name>/jwks
Then upgrade the Helm release to apply the changes:
helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
  -n truefoundry --create-namespace \
  -f truefoundry-values.yaml
Once enabled, platform admins can access the Grafana dashboard at:
https://<your-truefoundry-control-plane-url>/admin/grafana/
  • Replace <your-truefoundry-control-plane-url> with your actual control plane domain (e.g., app.example.com) and <tenant-name> with your TrueFoundry tenant name provided during onboarding.
  • Only users with the admin role can access this endpoint.
  • Make sure to include the trailing / at the end of the URL.
  • If you already have Prometheus or VictoriaLogs in your cluster, you can point the monitoring stack to them using externalServices instead of installing new instances.
For the full configuration reference, see the Control Plane Monitoring guide.
You can attach default metadata to every request that passes through the AI Gateway by setting the DEFAULT_GATEWAY_METADATA environment variable on the gateway. The value should be a JSON string of key-value pairs.Add the following to your gateway configuration in values file of the gateway plane:
tfy-llm-gateway:
  env:
    DEFAULT_GATEWAY_METADATA: '{"org":"internal"}'
The metadata key-value pairs will be automatically included in every request routed through the gateway. You can use this to tag requests with organizational identifiers, environment labels, or any other metadata your downstream systems need.
By default, the AI Gateway exposes a fixed set of Prometheus labels on its metrics. If you want to slice and aggregate gateway metrics by your own metadata fields (e.g. customer_id, request_type, environment), set the LLM_GATEWAY_METADATA_LOGGING_KEYS environment variable on the gateway. The value is a JSON-encoded array of metadata keys.Each key listed here is exposed as a Prometheus label prefixed with ai_gateway_metadata_* — for example, customer_id becomes the label ai_gateway_metadata_customer_id. You can then use these labels for granular filtering and aggregation in Grafana.Add the following to your gateway configuration in values file of the gateway plane:
tfy-llm-gateway:
  env:
    LLM_GATEWAY_METADATA_LOGGING_KEYS: '["customer_id", "request_type"]'
Once the gateway is restarted, requests that include these metadata keys (either via default metadata or per-request metadata) will emit Prometheus metrics with the corresponding ai_gateway_metadata_customer_id and ai_gateway_metadata_request_type labels.
Only add metadata keys with bounded, low-cardinality values (e.g. customer tier, request type, environment). Adding high-cardinality keys like user IDs or trace IDs as labels can cause your Prometheus / Victoria Metrics instance to consume excessive memory and storage.
The TrueFoundry Helm charts support the Kubernetes Gateway API as an alternative to standard Ingress resources. Use HTTPRoute when your cluster uses a Gateway API-compatible controller (e.g. Envoy Gateway, Istio, NGINX Gateway Fabric, GKE Gateway).Control plane (truefoundry chart)Add the following to your truefoundry-values.yaml, setting parentRefs to point to your existing Gateway:
truefoundry-values.yaml
global:
  httpRoute:
    enabled: true
    parentRefs:
      - name: my-gateway        # Name of your Gateway resource
        namespace: gateway-system  # Namespace where the Gateway is deployed
        sectionName: https      # Listener section on the Gateway (e.g. http or https)
    hostnames:
      - "app.example.com"       # Hostname that this HTTPRoute should match
Then apply:
helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
  -n truefoundry --create-namespace \
  -f truefoundry-values.yaml
  • Only one routing method should be enabled at a time. Disable global.ingress.enabled and global.virtualservice.enabled when using httpRoute.
  • The sectionName must match a named listener on your Gateway resource. Omit it if your Gateway has a single unnamed listener.
  • TLS termination is handled by the parent Gateway — no TLS configuration is needed on the HTTPRoute itself.
By default, the installation instructions use s3:* for the S3 bucket IAM policy for simplicity. If your organization requires a least-privilege approach, you can replace s3:* with the following minimal set of permissions:
{
  "Statement": [
    {
      "Sid": "S3",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucketMultipartUploads",
        "s3:GetBucketTagging",
        "s3:GetObjectVersionTagging",
        "s3:ReplicateTags",
        "s3:PutObjectVersionTagging",
        "s3:ListMultipartUploadParts",
        "s3:PutObject",
        "s3:GetObject",
        "s3:GetObjectAcl",
        "s3:AbortMultipartUpload",
        "s3:PutBucketTagging",
        "s3:GetObjectVersionAcl",
        "s3:GetObjectTagging",
        "s3:PutObjectTagging",
        "s3:GetObjectVersion",
        "s3:ListBucket",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::<YOUR_S3_BUCKET_NAME>",
        "arn:aws:s3:::<YOUR_S3_BUCKET_NAME>/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}
By default, the TrueFoundry Helm chart ships with container and pod security contexts configured for all components to follow security best practices — pods run as a non-root user (runAsNonRoot: true), use a read-only root filesystem (readOnlyRootFilesystem: true), and drop all privileges (capabilities.drop: [ALL]).However, NATS (used internally for messaging) does not have these defaults applied automatically. If your cluster enforces Pod Security Standards (e.g. restricted profile) or you want a consistent security posture across all components, you need to explicitly add the security context for NATS by adding the following to your truefoundry-values.yaml:
truefoundry-values.yaml
tfyNats:
  container:
    merge:
      securityContext:
        capabilities:
          drop:
            - ALL
        readOnlyRootFilesystem: true
        allowPrivilegeEscalation: false
  podTemplate:
    merge:
      spec:
        securityContext:
          fsGroup: 1000
          runAsUser: 1000
          runAsNonRoot: true
The NATS subchart uses a different values structure (container.merge and podTemplate.merge) compared to other TrueFoundry components. This is because NATS uses its own Helm chart conventions for overriding pod and container specs.
OpenShift clusters enforce Security Context Constraints (SCCs) that expect pods to have an empty security context so that OpenShift can inject arbitrary user and group IDs at runtime. By default, the TrueFoundry Helm chart sets explicit podSecurityContext and securityContext values (such as runAsUser, runAsNonRoot, fsGroup, etc.) on its components, which conflicts with the restricted or restricted-v2 SCC.To resolve this, disable both pod-level and container-level security contexts for all components by adding the following overrides to your truefoundry-values.yaml:
truefoundry-values.yaml
# Disable security contexts for OpenShift SCC compatibility
truefoundryBootstrap:
  podSecurityContext:
    enabled: false

mlfoundryServer:
  podSecurityContext:
    enabled: false

servicefoundryServer:
  podSecurityContext:
    enabled: false

tfyK8sController:
  podSecurityContext:
    enabled: false

tfyProxy:
  podSecurityContext:
    enabled: false

deltaFusionIngestor:
  podSecurityContext:
    enabled: false

deltaFusionCompaction:
  podSecurityContext:
    enabled: false

deltaFusionQueryServer:
  podSecurityContext:
    enabled: false

tfy-llm-gateway:
  podSecurityContext:
    enabled: false

tfy-otel-collector:
  podSecurityContext:
    enabled: false
Setting enabled: false removes all explicit security context fields from the pod and container specs, allowing OpenShift’s SCC admission controller to assign user and group IDs as needed.
The tfy-logs chart ships Vector as a DaemonSet that tails container logs from each node’s host filesystem and ships them to VictoriaLogs. Vector writes its checkpoint/snapshot state (the record of how far it has read in each log file) to a hostPath data directory on the node. The chart default points to a host directory that is not writable on RHCOS, so Vector cannot persist its checkpoints and the pod fails to start or restarts without retaining read positions. You must point this at a writable location on the node.On RHCOS the writable, persistent location is under /var/home/core (other paths such as /var/lib are managed and read-only for containers). Set persistence.hostPath.path to a writable directory there, for example /var/home/core/data/vector. If you mirror images into a private registry (common in air-gapped OpenShift clusters), also override the registry for victoria-logs-single and Vector — see Can I use my Artifactory as a mirror to pull images?.Because Vector runs as a DaemonSet that mounts the node’s host filesystem (hostPath) to read container logs, its service account must be granted the privileged SCC. Without this, OpenShift’s SCC admission controller blocks the pods and the DaemonSet will not start. Grant the SCC to the tfy-logs-vector service account in the tfy-logs namespace:
oc adm policy add-scc-to-user privileged -z tfy-logs-vector -n tfy-logs
On SELinux-enforcing nodes (RHCOS), the default container SELinux context (container_t) cannot read the host log files under /var/log. Set the pod’s SELinux type to spc_t (super-privileged container) so Vector is allowed to read them — otherwise the pod runs but collects no logs (permission denied).
tfy-logs-values.yaml
victoria-logs-single:
  enabled: true
  # Optional: only needed if you pull images from a private registry / mirror
  global:
    image:
      registry: <YOUR_REGISTRY>
  server:
    image:
      registry: <YOUR_REGISTRY>
  vector:
    enabled: true
    # Required on SELinux-enforcing nodes (RHCOS) so Vector can read host log files
    podSecurityContext:
      seLinuxOptions:
        type: spc_t
    # Optional: only needed if you pull images from a private registry / mirror
    image:
      repository: <YOUR_REGISTRY>/timberio/vector
    persistence:
      hostPath:
        enabled: true
        # Must be a writable location on the node. On RHCOS use a path under /var/home/core.
        path: /var/home/core/data/vector

# Vector for Windows nodes is not applicable on OpenShift
windowsVector:
  enabled: false
The directory is created automatically on each node by the DaemonSet. If your cluster uses a different writable mount (for example a dedicated data partition), set path to a writable directory on that mount instead — the value only needs to be writable by the Vector pod on every node.
After applying the values, verify the Vector DaemonSet is running on every node:
kubectl -n tfy-logs rollout status daemonset/<release-name>-vector
kubectl -n tfy-logs logs -l app.kubernetes.io/name=vector --tail=50
When the EKS CloudWatch Observability addon is enabled, its ADOT auto-instrumentation injects bundled Python libraries via PYTHONPATH that conflict with truefoundry-mlfoundry-server dependencies, causing the pod to enter CrashLoopBackOff. You may see errors like:
ImportError: cannot import name 'DEFAULT_CIPHERS' from 'urllib3.util.ssl_'
  (/otel-auto-instrumentation-python/urllib3/util/ssl_.py)
ImportError: cannot import name 'LogData' from 'opentelemetry.sdk._logs'
  (/otel-auto-instrumentation-python/opentelemetry/sdk/_logs/__init__.py)
To fix this, exclude the truefoundry namespace from the addon’s auto-instrumentation by updating the addon configuration:
{
  "manager": {
    "applicationSignals": {
      "autoMonitor": {
        "exclude": {
          "python": { "namespaces": ["truefoundry"] },
          "java": { "namespaces": ["truefoundry"] },
          "nodejs": { "namespaces": ["truefoundry"] },
          "dotnet": { "namespaces": ["truefoundry"] }
        }
      }
    },
    "autoAnnotateAutoInstrumentation": {
      "python": { "namespaces": [] },
      "java": { "namespaces": [] },
      "nodejs": { "namespaces": [] },
      "dotnet": { "namespaces": [] }
    }
  }
}
After updating the addon config, restart the deployment and verify the pods are running:
kubectl rollout restart deployment truefoundry-mlfoundry-server -n truefoundry