Guides - Kubernetes LLM Platform

Task-focused runbooks for operating the platform after it is installed. Start with the area that is failing or changing; each guide points back to the manifests and decisions that own the behavior.

GPU & scheduling

Debug GPU visibility, pending workloads, queue admission, and KEDA queue-depth autoscaling.

Serving

Run raw vLLM, compare KServe, package OCI modelcars, and change the served model.

Routing & gateway

Route requests through LiteLLM, the inference gateway, tenant budgets, and guardrails.

Experience apps

Run n8n, the API key portal, Open WebUI, Tabby, and coding-assistant surfaces.

Platform & ops

Wire secrets, SSO, security controls, staged bring-up, HA validation, and teardown.

Benchmarking

Measure latency, throughput, saturation, and serving regressions with GuideLLM.

GPU debugging

⌘I