The Infrastructure Hub -- Part 9
The Infrastructure Hub Reference Architecture
The Problem
You’ve built eight things. A Terraform catalog, golden path templates, multi-tenant client management, pipeline orchestration with CAB workflows, quantum-safe secrets, signed approvals, an infrastructure chat, and drift detection.
Each article showed one piece. Now you need the complete picture: how these pieces fit together, what the database looks like with everything running, how the AI service has grown, and how to deploy the full platform.
The IDP series reference architecture covered the application-focused platform: catalog enrichment, AI scaffolding, code review, TechDocs RAG, governance, and incident response. This article extends that architecture with everything from the Infrastructure Hub series.
The Solution
The Infrastructure Hub adds three layers to the existing IDP platform:
- Infrastructure catalog — Terraform modules as first-class entities, with golden path templates and multi-tenant scoping
- Infrastructure operations — Pipeline orchestration, CAB workflows with signed approvals, and drift detection
- Infrastructure intelligence — AI-powered secret scanning, plan summaries, failure diagnosis, and conversational access to all infrastructure data
All of this runs on the same Backstage instance, the same .NET AI service, and the same PostgreSQL database as the IDP. No new services to deploy — just new plugins, new endpoints, and new tables.
Execute
The Complete Architecture
graph TB
subgraph "Developer / Platform Team"
Browser[Browser]
end
subgraph "Backstage"
FE[Frontend - React]
BE[Backend - Node.js]
subgraph "IDP Plugins (series 1)"
P1[Catalog Enricher]
P2[AI Scaffolder]
P3[Code Review]
P4[TechDocs RAG]
P5[Governance]
P6[Incident Response]
end
subgraph "Infra Hub Plugins (series 2)"
I1[TF Module Templates]
I2[Secret Scanner]
I3[Pipeline Dashboard]
I4[CAB Workflow]
I5[Infra Chat]
I6[Drift Dashboard]
end
end
subgraph "AI Service (.NET)"
API[".NET Minimal API :5100"]
E_IDP["/api/enrich, /api/scaffold,\n/api/review, /api/ask,\n/api/incident/analyze"]
E_INFRA["/api/scan-secrets,\n/api/scaffold-terraform,\n/api/pipeline/*,\n/api/cab/*,\n/api/infra/chat,\n/api/drift/*,\n/api/ssh/issue"]
end
subgraph "Data"
PG[(PostgreSQL + pgvector)]
QV[QuantumVault - Secrets]
QS[QuantumAPI - Signing]
QC[QuantumAPI - SSH Certs]
AI[AI Provider]
end
subgraph "External"
GH[GitHub API]
ADO[Azure DevOps]
GHA[GitHub Actions]
GL[GitLab CI]
AZ[Azure / Scaleway / AWS]
end
Browser --> FE --> BE
BE --> P1 & P2 & P3 & P4 & P5 & P6
BE --> I1 & I2 & I3 & I4 & I5 & I6
P1 & P2 & P3 & P4 & P5 & P6 --> API
I2 & I3 & I4 & I5 & I6 --> API
API --> E_IDP & E_INFRA
API --> PG
API --> AI
API --> QV & QS & QC
I3 --> ADO & GHA & GL
I6 --> AZ
New Endpoints in the AI Service
The Infrastructure Hub adds these endpoints to the existing AI service:
| Endpoint | Article | Purpose |
|---|---|---|
POST /api/scaffold-terraform | 2 | Generate Terraform module from description |
POST /api/pipeline/summarize-plan | 4 | Human-readable Terraform plan summary |
POST /api/pipeline/diagnose | 4 | AI diagnosis of pipeline failures |
POST /api/pipeline/risk-assessment | 4 | Risk assessment for change requests |
POST /api/pipeline/rollback-plan | 4 | Generate rollback plan for a change |
POST /api/scan-secrets | 5 | Scan Terraform files for secret issues |
POST /api/ssh/issue | 5 | Issue ML-DSA SSH certificate |
POST /api/cab/approve | 6 | Sign CAB approval with ML-DSA |
GET /api/cab/verify/{id} | 6 | Verify approval signature |
POST /api/cab/seal-evidence | 6 | Seal evidence package with signature |
GET /api/cab/report | 6 | Generate compliance report |
POST /api/infra/chat | 7 | Multi-turn infrastructure conversation |
POST /api/drift/analyze | 8 | Analyze Terraform plan for drift |
GET /api/drift/results | 8 | Fetch all drift scan results |
Combined with the IDP series endpoints, the AI service now has 22 endpoints in one Program.cs. The pattern is the same for every endpoint: read config, create client, build prompt with context, call model, return structured JSON.
New Database Tables
Three tables added in this series:
-- Article 6: Signed CAB approvals
CREATE TABLE cab_approvals (
id SERIAL PRIMARY KEY,
change_request_id VARCHAR(100) NOT NULL UNIQUE,
approved_by VARCHAR(255) NOT NULL,
approved_at TIMESTAMPTZ NOT NULL,
module VARCHAR(255) NOT NULL,
client VARCHAR(100),
risk_level VARCHAR(20) NOT NULL,
plan_hash VARCHAR(64) NOT NULL,
payload_json TEXT NOT NULL,
signature TEXT NOT NULL,
key_id VARCHAR(100) NOT NULL
);
CREATE INDEX idx_approvals_module ON cab_approvals(module);
CREATE INDEX idx_approvals_client ON cab_approvals(client);
CREATE INDEX idx_approvals_date ON cab_approvals(approved_at);
-- Article 8: Drift detection results
CREATE TABLE drift_results (
id SERIAL PRIMARY KEY,
module VARCHAR(255) NOT NULL UNIQUE,
client VARCHAR(100),
drift_detected BOOLEAN NOT NULL,
resource_count INTEGER NOT NULL DEFAULT 0,
risk VARCHAR(20) NOT NULL DEFAULT 'none',
summary TEXT,
analysis_json TEXT,
detected_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_drift_module ON drift_results(module);
CREATE INDEX idx_drift_client ON drift_results(client);
CREATE INDEX idx_drift_risk ON drift_results(risk);
Combined with the IDP series tables:
| Table | Series | Purpose |
|---|---|---|
doc_chunks | IDP art. 5 | Vector embeddings for RAG |
ai_usage_log | IDP art. 6 | Governance — usage tracking |
ai_policies | IDP art. 6 | Governance — per-team policies |
cab_approvals | Infra art. 6 | Signed CAB approvals |
drift_results | Infra art. 8 | Latest drift scan per module |
Five tables. One PostgreSQL instance (with pgvector extension). Backstage uses its own tables for the catalog, and the AI service uses these five for intelligence and operations.
New Backstage Plugin Registration
Add the infrastructure plugins to packages/backend/src/index.ts:
// --- Infrastructure Hub plugins (series 2) ---
// Modules (extend existing plugins)
import { secretScannerModule } from '@internal/plugin-secret-scanner';
backend.add(secretScannerModule); // Article 5
// Standalone plugins (own routes)
import { aiIncidentPlugin } from '@internal/plugin-ai-incident';
backend.add(aiIncidentPlugin); // IDP Article 7
// Frontend-only plugins (registered in App.tsx, not here):
// - Infra Chat (/infra-chat) // Article 7
// - Drift Dashboard (/drift) // Article 8
// - CAB Review (/cab) // Article 4+6
// - Governance Dashboard (/ai-governance) // IDP Article 6
The infrastructure plugins follow the same distinction as the IDP:
- Modules (
createBackendModule): secret scanner extends the catalog - Standalone plugins (
createBackendPlugin): pipeline dashboard, CAB workflow have their own routes - Frontend-only: infra chat, drift dashboard, governance dashboard read from the AI service through the proxy
QuantumAPI Integration Map
QuantumAPI appears in three roles across the Infrastructure Hub:
QuantumVault (Secrets)
├── Pipeline credentials (art. 5) — ARM_CLIENT_SECRET, DB_PASSWORD, etc.
├── Terraform state encryption keys (art. 5) — ML-KEM wrapped AES keys
├── Cosign signing keys (quantum-05) — for image signing
└── Bootstrap: only QUANTUMAPI_KEY in CI/CD platforms
QuantumAPI Signing (ML-DSA)
├── CAB approval signatures (art. 6) — every approval cryptographically signed
├── Evidence package sealing (art. 6) — tamper-proof audit trail
└── Verification endpoint — auditors can verify without internal access
QuantumAPI SSH (ML-DSA Certificates)
├── Short-lived certificates (art. 5) — 8h validity, auto-expire
├── CA trust model — hosts trust QuantumAPI CA, not individual keys
└── Backstage widget — engineers request access from the catalog
QuantumAPI Local Installation
For sovereign cloud, air-gapped environments, or organizations that can’t send data to external APIs, QuantumAPI offers a local installation option.
The local install runs the same services — Vault, Signing, SSH CA, Encryption — inside your own infrastructure. The API is identical. Your code doesn’t change. You point the endpoints at https://quantumapi.internal instead of https://api.quantumapi.eu.
Configuration change in the AI service:
# Cloud (default)
QUANTUMAPI__APIKEY=qid_your_key
# Endpoints default to api.quantumapi.eu
# Local installation
QUANTUMAPI__APIKEY=qid_your_local_key
QUANTUMAPI__ENDPOINT=https://quantumapi.internal
For the qapi CLI in pipelines:
# Cloud
export QAPI_API_KEY=qid_your_key
# Local
export QAPI_API_KEY=qid_your_local_key
export QAPI_ENDPOINT=https://quantumapi.internal
Everything in this series — state encryption, signed approvals, SSH certificates, secret scanning — works the same way with a local install. The cryptographic guarantees (ML-KEM, ML-DSA, QRNG) are the same because the algorithms run locally.
Use cases for local installation:
- Government / defense — data cannot leave the network
- Financial services — regulatory requirement to keep all key material on-premise
- EU sovereign cloud — data residency requirements beyond what cloud-hosted QuantumAPI offers
- Air-gapped environments — no internet access from the infrastructure network
The Complete Environment Variables
# === AI Provider ===
AI_PROVIDER=openai # or "azure"
AI_ENDPOINT=https://api.scaleway.ai/v1
AI_KEY=your-key
AI_CHAT_MODEL=mistral-small-3.2-24b-instruct-2506
AI_EMBEDDING_MODEL=bge-multilingual-gemma2
# === PostgreSQL (shared) ===
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=forge
POSTGRES_PASSWORD=your-password
# === QuantumAPI ===
QUANTUMAPI_KEY=qid_your_key
# QUANTUMAPI_ENDPOINT=https://quantumapi.internal # Only for local install
# === GitHub ===
GITHUB_TOKEN=ghp_your-token
# === OIDC (Backstage auth) ===
OIDC_METADATA_URL=https://auth.quantumapi.eu/.well-known/openid-configuration
OIDC_CLIENT_ID=your-client-id
OIDC_CLIENT_SECRET=your-client-secret
BACKEND_SECRET=change-this
# === Webhooks ===
AI_CODE_REVIEW_WEBHOOK_SECRET=your-webhook-secret
Same variables as the IDP series, plus QUANTUMAPI_KEY (and optionally QUANTUMAPI_ENDPOINT). The infrastructure plugins don’t need additional env vars — they read from the same AI service config.
The Two Series Together
The IDP series builds the platform for applications — services, APIs, code. The Infra Hub series extends it for infrastructure — Terraform modules, pipelines, cloud resources. Same Backstage. Same AI service. Same philosophy: AI as the engine, humans in control.
Cost with Infrastructure Plugins
Adding the infrastructure features to the cost estimate from IDP article 8:
| Feature | Frequency | Tokens/call | Monthly cost |
|---|---|---|---|
| IDP features (from series 1) | — | — | ~$9 |
| Terraform scaffolding | ~5 modules/month | ~2K input, ~3K output | ~$0.11 |
| Plan summaries | ~100 plans/month | ~4K input, ~1K output | ~$1.30 |
| Secret scanning | 12h cycle, ~30 modules | ~3K input, ~500 output | ~$0.95 |
| Drift analysis | Daily, ~30 modules | ~4K input, ~1K output | ~$4.50 |
| Infra chat | ~300 questions/month | ~5K input, ~1K output | ~$4.80 |
| CAB signing | ~50 approvals/month | N/A (QuantumAPI call) | ~$0 (included in tier) |
Total: ~$21/month for a 20-developer team managing ~30 Terraform modules across multiple clients. The QuantumAPI calls (signing, SSH certs, vault) are included in the business tier.
Security Reminder
The same security gaps from IDP article 8 apply here, plus:
- No auth on drift/chat/CAB endpoints — the AI service has no authentication. In production, add JWT validation or API key checks.
- Terraform plan output may contain secrets — the plan text sent to the AI model can include secret values (e.g., old vs new password). Consider scrubbing plan output before sending to the AI. The PII scrubber from the AI in Production series works here too.
- CAB signatures depend on QuantumAPI availability — if QuantumAPI is down, approvals can’t be signed. The signing endpoint returns 503 and the UI blocks the approval. This is intentional (unsigned = unapproved), but plan for QuantumAPI availability in your SLA calculations.
The Series
| Article | What it does | New Plugin / Endpoint |
|---|---|---|
| 1. Your Infra Has No Catalog | Terraform modules as catalog entities | Catalog entities |
| 2. Golden Path Terraform Modules | AI-powered module scaffolding | /api/scaffold-terraform |
| 3. Multi-tenant Infrastructure | Per-client systems, teams, config | Catalog model |
| 4. Pipelines from Backstage | Unified pipeline UI + CAB workflow | /api/pipeline/* |
| 5. Secrets & PQ Identities | QuantumVault, SSH certs, secret scanner | /api/scan-secrets, /api/ssh/issue |
| 6. CAB Automation | Signed approvals, compliance reports | /api/cab/* |
| 7. Chat with Your Infra | Conversational infrastructure access | /api/infra/chat |
| 8. Drift Detection | Detect and explain infrastructure drift | /api/drift/* |
| 9. Reference Architecture | This article — everything connected | — |
Troubleshooting
In addition to the IDP troubleshooting section:
- Secret scanner finds nothing — Check that modules have
spec.type: terraform-modulein the catalog. The scanner filters on this type. - CAB signatures fail — Verify
QuantumApi:ApiKeyis set in the AI service config. The signing endpoint returns 503 with a clear error message. - Drift scan shows no results — The GitHub Actions workflow needs
terraform initto succeed, which requires cloud credentials. Check the QuantumVault secret IDs in the workflow variables. - Infra chat gives empty answers — The chat gathers context from the catalog database and
cab_approvalstable. If these are empty, the AI has no data to work with. Run the catalog enricher and approve at least one change first. PIPESTATUSnot working — If your CI runner usesshinstead ofbash,PIPESTATUSdoesn’t exist. Usebashexplicitly:shell: bashin GitHub Actions.
What’s Next
Two series complete. One Backstage instance. One AI service. 22 endpoints. 5 custom tables. The platform manages both applications and infrastructure, with AI assistance at every step and post-quantum security throughout.
What’s missing? The things we left out on purpose:
Kubernetes admission control. The Quantum-Safe Cloud series mentioned this gap: unsigned images can still be deployed manually. A Ratify admission webhook would reject pods with unsigned images at the cluster level.
Cost management dashboards. The governance dashboard tracks AI costs. But what about infrastructure costs? Cloud spend per client, per module, per environment. That’s a Backstage plugin that reads from Azure Cost Management, AWS Cost Explorer, or Scaleway billing APIs.
Policy as code. The CAB workflow is manual (with AI assistance). Open Policy Agent or Kyverno could automate policy enforcement: “no public storage accounts”, “all AKS clusters must have RBAC enabled”, “no modules without encryption blocks.”
Each of these could be a standalone article or a mini-series. If there’s interest, let me know.
The code is on GitHub: victorZKov/forge.
Victor
If this series helps you, consider buying me a coffee.
This is article 9 — the final article in the Infrastructure Hub series. Previous: Drift Detection.
Loading comments...