GPU & scheduling
Debug GPU visibility, pending workloads, queue admission, and KEDA queue-depth autoscaling.
Serving
Run raw vLLM, compare KServe, package OCI modelcars, and change the served model.
Routing & gateway
Route requests through LiteLLM, the inference gateway, tenant budgets, and guardrails.
Experience apps
Run n8n, the API key portal, Open WebUI, Tabby, and coding-assistant surfaces.
Platform & ops
Wire secrets, SSO, security controls, staged bring-up, HA validation, and teardown.
Benchmarking
Measure latency, throughput, saturation, and serving regressions with GuideLLM.