Skip to main content

Deployment profiles

A profile is a cumulative selection of layers, applied with make root PROFILE=… (default platform). Each profile is a superset of the one before: you widen, you don’t switch.
PROFILELayers appliedAdds
platformplatformGitOps base: GPU platform, Kueue, observability, secrets
servingplatform + servingraw vLLM / KServe serving
llm-gatewayplatform + serving + routing + llm-gatewayGIE routing + LiteLLM tenant gateway
fullall of the above + demosexample tenants and demos
make root PROFILE=serving      # platform + serving
make wait  PROFILE=serving     # block until auto-sync apps are Synced+Healthy
make smoke PROFILE=serving     # profile smoke checks (serving runs the vLLM smoke)
Run make argocd-repo first while the repo is private (registers the Argo CD repo credential).

Make targets

make help lists every target with its one-line description; the tables below group the ones you run most.

Cloud substrate (OpenTofu, paid resources)

TargetWhat
tf-init / tf-validate / tf-planinitialize, validate, plan the GKE root
tf-applyapply the cloud substrate (costs money; AUTO_APPROVE=1 for non-interactive/CI)
tf-destroydestroy the substrate (read the teardown runbook first)
tf-fmtcheck OpenTofu/Terraform formatting (the CI format gate)
tf-credentialsprint the gcloud get-credentials command from TF output

Fork & config

TargetWhat
fork-initpropagate environments/$(CLUSTER)/config.yaml (repo URL + GCP project) across the repo
config-checkvalidate config propagation and tfvars drift
resolve-groupsresolve config.yaml features: into groups.generated.yaml (chains the four resolvers below)
resolve-profile / resolve-secret-store / resolve-gpu / resolve-guardrailsresolve the profile (LiteLLM + CNPG HA overlays), the ESO ClusterSecretStore, the GPU stack (gpu_stack → gpu-operator group + DCGM scrape target), and the guardrails overlay

GitOps bring-up

TargetWhat
require-kubevalidate + print the dedicated cluster target (./kubeconfig); the gate every cluster command runs first
bootstrapinstall Argo CD on the repo’s dedicated cluster (./kubeconfig; pinned chart)
argocd-repocreate/update the Argo CD repo credential (private repo)
seed-secretsseed the internal random secrets (LiteLLM/vLLM/Dex/oauth2-proxy/DB) into the backend, idempotent; prints the external ones you must supply
reset-dex-adminrotate a lost Dex static-admin password, persist password+hash in GSM, force ESO refresh/restart Dex
rootapply the platform AppProject + app-of-apps roots for PROFILE
wait / smoke / doctorwait for sync · run smoke checks · validate prerequisites
verifyend-to-end platform check (GitOps + economics + budget-429 + serving + edge/SSO), plain pass/fail
seed-experience(optional) manually re-mint Open WebUI’s experience secret from live LiteLLM; normally the in-cluster litellm-keys Job mints both apps’ keys automatically
argocd-password / argocd-uiprint admin password · port-forward the UI to localhost:8080
credentialscollate operator credentials into the gitignored secrets/credentials.local.md (SSO is the real path)

Serving & GPU (cost-sensitive)

TargetWhat
vllm-up / vllm-downscale raw vLLM to 1 (brings up an L4, costs) / to 0 ($0 idle)
vllm-smokesend an authenticated OpenAI chat request to raw vLLM
gpu-smokerun the nvidia-smi GPU smoke Job (verifies the GPU stack; triggers a GPU node, scales back to 0)
keda-demo-up / keda-demo-downun-pause / re-pause the raw-vLLM KEDA ScaledObject (load-test only)
modelcar-buildbuild + push the OCI modelcar, print the @sha256 digest to pin
benchrun the vLLM concurrency-sweep benchmark Job (after vllm-up)
bench-guidellmrun the GuideLLM SLO-frontier sweep Job (standard serving benchmark; after vllm-up)
guardrails-smokeprove the LiteLLM guardrails: PII masked + prompt-injection blocked (needs features.guardrails: true synced)

Cost control

TargetWhat
pausepause to $0: release Gateway LBs, destroy the substrate with tofu, audit orphans (destructive; Secret Manager values survive)
resumeresume from a paused (destroyed) cluster: tofu apply + bootstrap + platform base

Docs

TargetWhat
docs-serveserve the docs site locally with live reload (Mintlify, http://localhost:3000)
docs-buildvalidate the docs site (broken-link check, same as CI)

Fork configuration

environments/<env>/config.yaml is the single source of fork config: repo URL, cloud project, domain, DNS provider. Edit it, then make fork-init rewrites the repo to match. Secret values never live here; they come from a cloud secret manager via External Secrets Operator.