The ownership boundary
Two systems own two halves of the platform, and the split is hard:- OpenTofu owns the cloud substrate. The cluster, node pools, Workload Identity, IAM, and
Artifact Registry. Everything under
infra/. Argo CD never touches it. - Argo CD owns everything inside Kubernetes. Controllers, CRDs, workloads, and secret sync, reconciled from this repository by GitOps. OpenTofu never touches it.

Top-level directories
| Path | What it holds |
|---|---|
infra/ | Cloud substrate per provider (GKE today, Hetzner as the portability target). OpenTofu roots plus thin cluster prerequisites. |
bootstrap/ | The Argo CD install (pinned Helm chart) and repo credential. The GitOps entry point. |
clusters/<cluster>/ | The deployment-profile catalog for a cluster: catalog/ (child Applications grouped by capability), appsets/ (one ApplicationSet per layer), projects/ (the platform AppProject guardrail). |
platform/ | Cluster services: observability, external-secrets, Kueue, KEDA, cert-manager, identity, security policy. |
serving/ | Model servers: raw vLLM, KServe, and llm-d. |
routing/ | The inference gateway, edge gateway, and MCP gateway. |
experience/ | End-user and automation surfaces: Open WebUI, Tabby, key-portal, and n8n. |
workloads/ | GPU smoke job and Kueue demos. |
benchmarks/ | The serving benchmark jobs and recorded runs. |
dashboards/ | Grafana dashboards. |
environments/<env>/config.yaml | The single source of fork configuration: repository URL, project ID, domain, and feature flags. |
scripts/ | The resolver and operations scripts that the make targets call. |
tests/ | Golden-file tests for the config resolvers. |
docs/public/ | This documentation site. The only part of docs/ that publishes. |
How a component is wired
A running component appears in three places:- The implementation lives under the matching top-level directory (
platform/,serving/,routing/,workloads/) as Helm values or manifests. - A child Argo
Applicationpoints at it fromclusters/<cluster>/catalog/<group>/. These directories hold only plainApplicationYAML, so adirectory.recursesource renders them cleanly. - A capability flag in
config.yamldecides whether that group’s Applications exist at all.
- Profile selects which layers you apply, with
make root PROFILE=…. See make targets. - Features select which capability groups exist, through
config.yaml. Runmake resolve-groupsto regenerateclusters/<cluster>/groups.generated.yaml. - Bring-up is each Application’s own sync policy: a manual-sync app is created but not deployed until you sync it.
.generated.yaml are build artifacts. Regenerate them with the resolver scripts
rather than editing by hand.