Skip to main content
Why the platform is built the way it is. These records capture the public tradeoffs behind the v1 platform shape: serving, routing, tenancy, security, delivery, and operations. The full ADR history stays in the repository. This page keeps only the decisions that help operators understand the platform they are installing.

Platform and scheduling

RecordDecision
ADR-0001Use the provider-managed GPU stack, not a self-managed GPU Operator
ADR-0002Kueue for GPU quota and admission
ADR-0014Autoscale on vLLM queue depth, not GPU utilization
ADR-0025Cold start as four independent, cloud-agnostic levers

Serving

RecordDecision
ADR-0003Inference SLOs and the metrics that back them
ADR-0006Raw vLLM as default; KServe as the lifecycle alternative
ADR-0016Digest-pinned OCI modelcar as the model-delivery default
ADR-0032GuideLLM as the standard serving benchmark

Routing and tenancy

RecordDecision
ADR-0005Inference-aware routing with the Gateway API Inference Extension
ADR-0013LiteLLM layered above the inference gateway, not instead of it
ADR-0033Governed MCP gateway profile

Security, secrets, and identity

RecordDecision
ADR-0011Secrets and config strategy
ADR-0026Authentication and SSO with Dex and oauth2-proxy
ADR-0029Force the budget path, gate GPU admission, fail closed
ADR-0034Tenant-edge guardrails: PII masking and prompt-injection block

Observability

RecordDecision
ADR-0015Spend dashboard from Postgres now, deeper tracing deferred

Delivery and configuration

RecordDecision
ADR-0000Scope: what this is, what it is not, and the bar
ADR-0027Deployment profiles as additive layer roots
ADR-0028IaC owns the cloud substrate; Argo CD owns in-cluster lifecycle
ADR-0031Config-driven feature selection