/v1 facade: virtual keys carry per-key $budgets,
TPM/RPM limits, and spend tracking. Chart 1.89.2; Postgres via CloudNativePG. Argo apps:
cloudnative-pg (operator, wave 1), litellm-bootstrap (Cluster + ESO secrets, wave 3),
litellm (proxy, wave 4). All auto-sync (platform tier).
1. Prereqs: the secrets (do this first)
Create the backing values in GCP Secret Manager (see the secret reference):vllm-api-key already exists (raw-vLLM). The salt key must never change: it encrypts provider
keys stored in Postgres; rotating it orphans them.
2. Bring-up (staged)
litellm-migrations, an Argo PreSync hook) applies the Prisma schema before
the proxy starts; the proxy itself runs with DISABLE_SCHEMA_UPDATE=true so it never self-migrates.
3. Validate: virtual keys + budgets
Port-forward and use the master key to mint two virtual keys with different budgets:4. Gotchas
- Salt key is write-once. See §1. Never rotate
litellm-salt-key. masterkeySecretName: litellm-secretsis set so the chart does NOT mint its own random master key (which would regenerate each sync and invalidate every issued virtual key).- DB single instance = mitigated SPOF.
allow_requests_on_db_unavailable=truekeeps the proxy serving if Postgres blips (new-key/spend writes pause). HA later: bumpCluster.spec.instances. - Real output needs real GIE routing + GPU. LiteLLM’s
qwen-localroutes to the in-cluster gateway; real completions requiremake vllm-upand the real-routing apps synced (runbookinference-gateway.md§8). - Optional external provider: add
ANTHROPIC_API_KEYto GSM + thelitellm-secretsExternalSecret, then uncomment theclaude-haikumodel inplatform/litellm/values.yaml. - 0 and budgets never bind. Both are set on
qwen-local. - Spend is async. LiteLLM batches per-request spend to
LiteLLM_SpendLogsand aggregates to the key’sspenda few seconds later; the budget check reads the (slightly lagging) cached spend. So a burst of rapid calls can briefly overshoot before the next one is rejected: enforcement is eventually-consistent, not per-token-exact. (Verified: budget $0.00015 → call 3 returns HTTP 429 “Budget has been exceeded”.) - Querying the DB: the CNPG
postgrescontainer uses peer auth, so connect as thepostgressuperuser (psql -U postgres -d litellm), not-U litellm(which fails peer auth over the socket).