The Reference Architecture - Victor Zaragoza

The Problem

You’ve read seven articles. You’ve seen the code for catalog enrichment, AI scaffolding, context-aware code review, documentation RAG, governance, and incident response. Each article showed one piece. But how do they fit together?

Where does the .NET AI service run? How does it connect to PostgreSQL? How does Backstage talk to it? What does the Kubernetes deployment look like? What environment variables do you need? How do you add a new AI feature without breaking the existing ones?

This article answers all of that. No new features — just the complete picture.

The Solution

The architecture has four layers:

Backstage — The frontend and backend plugins. Developers interact with this.
AI Service — The .NET Minimal API that handles all AI operations. Backstage plugins call this.
PostgreSQL + pgvector — Stores catalog data, vector embeddings, usage logs, and policies.
AI Provider — Any OpenAI-compatible API: Scaleway Generative APIs with Mistral, Azure AI Foundry, or Mistral AI directly.

graph TB
    subgraph "Developer"
        Browser[Browser]
    end

    subgraph "Backstage :7009"
        FE[Frontend - React :3456]
        BE[Backend - Node.js :7009]
        P1[Catalog Enricher Plugin]
        P2[AI Scaffolder Plugin]
        P3[AI Code Review Plugin]
        P4[TechDocs RAG Plugin]
        P5[AI Governance Plugin]
        P6[AI Incident Plugin]
    end

    subgraph "AI Service"
        API[".NET Minimal API :5100"]
        E1["/api/enrich"]
        E2["/api/scaffold"]
        E3["/api/review"]
        E4["/api/index-doc"]
        E5["/api/ask"]
        E6["/api/incident/analyze"]
        E7["/api/governance/*"]
        E8["/api/scaffold-terraform"]
    end

    subgraph "Data"
        PG[(PostgreSQL + pgvector)]
        AO[AI Provider - Scaleway/Mistral/Azure]
    end

    subgraph "External"
        GH[GitHub API]
        WH[Webhooks - GitHub, Alertmanager]
    end

    Browser --> FE
    FE --> BE
    BE --> P1 & P2 & P3 & P4 & P5 & P6
    P1 & P2 & P3 & P4 & P5 & P6 --> API
    API --> E1 & E2 & E3 & E4 & E5 & E6 & E7 & E8
    API --> PG
    API --> AO
    P1 & P3 & P6 --> GH
    WH --> P3 & P6

Execute

The Repository Structure

forge/
├── app-config.yaml                    # Backstage configuration
├── catalog/
│   └── all.yaml                       # Catalog entities
├── templates/
│   ├── dotnet-api/                    # Static template (article 1)
│   ├── ai-service/                    # AI-powered template (article 3)
│   └── terraform-module/              # Terraform module template (devops-02)
├── modules/                           # Real Terraform modules in the catalog
├── ai-service/
│   └── CatalogEnricher/
│       ├── Program.cs                 # All AI endpoints in one service
│       ├── CatalogEnricher.csproj
│       ├── AiUsageLogger.cs           # Governance middleware
│       ├── appsettings.json
│       └── Dockerfile
├── plugins/
│   ├── catalog-enricher-backend/      # Article 2
│   ├── ai-scaffolder/                 # Article 3
│   ├── ai-code-review/               # Article 4
│   ├── techdocs-rag/                  # Article 5
│   ├── techdocs-rag-widget/           # Article 5 (frontend widget)
│   ├── ai-governance/                 # Article 6
│   ├── ai-incident/                   # Article 7
│   ├── admin/                         # Admin plugin (frontend)
│   └── admin-backend/                 # Admin plugin (backend)
├── packages/
│   ├── app/                           # Backstage frontend
│   └── backend/
│       └── src/
│           └── index.ts               # All plugin registrations
├── k8s/
│   ├── backstage.yaml                 # Backstage deployment
│   ├── ai-service.yaml               # AI service deployment
│   └── postgresql.yaml                # PostgreSQL with pgvector
├── .env.example
└── README.md

The Complete .NET AI Service

All endpoints live in one Program.cs. Here’s the complete file that combines every article. Each endpoint follows the same pattern: read config, create a ChatClient, build a system prompt with context, call the model, return structured JSON.

The full implementation of each endpoint is in its own article. Here we show the real signatures and the shared infrastructure — this is the actual file, not a summary.

// ai-service/CatalogEnricher/Program.cs
using System.ClientModel;
using System.Text.Json;
using System.Text.Json.Serialization;
using Azure.AI.OpenAI;
using Npgsql;
using OpenAI;
using OpenAI.Chat;
using OpenAI.Embeddings;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddCors(options =>
{
    options.AddPolicy("DevCors", policy =>
        policy.WithOrigins("http://localhost:3456", "http://localhost:7007", "http://localhost:7008")
              .AllowAnyHeader()
              .AllowAnyMethod());
});

var app = builder.Build();
app.UseCors("DevCors");

app.MapGet("/healthz", () => Results.Ok(new { status = "healthy" }));

// --- Article 2: Catalog enrichment ---
app.MapPost("/api/enrich", async (EnrichRequest request, IConfiguration config) =>
{
    // Validates request.Files is not empty
    // Creates ChatClient from config (AI:Provider, AI:Endpoint, AI:Key, AI:ChatModel)
    // System prompt: "Analyze source files, return JSON with description, tags, dependencies, apiEndpoints"
    // Returns: CatalogMetadata { Description, Tags, Dependencies, ApiEndpoints }
    // Full implementation → article 2
});

// --- Article 3: Project scaffolding ---
app.MapPost("/api/scaffold", async (ScaffoldRequest request, IConfiguration config) =>
{
    // Validates request.Description is not empty
    // Creates ChatClient from config
    // System prompt: "Given a service description, produce name, type, dependencies, auth, kubernetes, nugetPackages, envVars, gotchaPrompt"
    // Returns: ScaffoldResult with normalized gotchaPrompt (handles string or object)
    // Full implementation → article 3
});

// --- Article 4: Code review ---
app.MapPost("/api/review", async (ReviewRequest request, IConfiguration config) =>
{
    // Validates request.Diff is not empty
    // Creates ChatClient from config
    // System prompt includes: ServiceDescription, Tags, Dependencies, GotchaHeuristics
    // Focus: architectural rule violations, security issues, service-specific patterns
    // Returns: { review: string }
    // Full implementation → article 4
});

// --- Article 5: RAG indexing ---
app.MapPost("/api/index-doc", async (IndexDocRequest request, IConfiguration config) =>
{
    // Validates request.Content is not empty
    // Creates EmbeddingClient from config (AI:EmbeddingModel, uses Rag:PostgresConnection)
    // Splits content into 2000-char chunks, embeds each, upserts to doc_chunks table
    // Returns: { chunksIndexed: int }
    // Full implementation → article 5
});

// --- Article 5: RAG search ---
app.MapPost("/api/ask", async (AskRequest request, IConfiguration config) =>
{
    // Validates request.Question is not empty
    // Creates EmbeddingClient + ChatClient from config
    // Embeds question → vector search in doc_chunks → top 5 results
    // System prompt: "Answer using ONLY retrieved documentation, cite sources"
    // Returns: AskResponse { Answer, Sources[] }
    // Full implementation → article 5
});

// --- Article 6: Governance ---
app.MapGet("/api/governance/usage", async (string? action, string? team, int? days, IConfiguration config) =>
{
    // Queries ai_usage_log grouped by action, team, status
    // Returns: list of { Action, Team, Status, CallCount, TotalInputTokens, TotalOutputTokens, AvgDurationMs }
});

app.MapGet("/api/governance/costs", async (int? days, IConfiguration config) =>
{
    // Queries ai_usage_log aggregated by day, calculates estimated cost
    // Cost formula: (inputTokens * 2.0 / 1M) + (outputTokens * 6.0 / 1M)
    // Returns: list of { Day, InputTokens, OutputTokens, EstimatedCostUsd }
});

app.MapGet("/api/governance/policies", async (IConfiguration config) =>
{
    // Reads all rows from ai_policies table
    // Returns: list of { Id, Team, Action, Enabled, MaxDailyCalls }
});

app.MapPut("/api/governance/policies", async (PolicyUpdate update, IConfiguration config) =>
{
    // Upserts policy: INSERT ... ON CONFLICT (team, action) DO UPDATE
    // Returns: Ok()
});

// --- Article 7: Incident response ---
app.MapPost("/api/incident/analyze", async (IncidentRequest request, IConfiguration config) =>
{
    // Creates ChatClient from config
    // System prompt includes: ServiceName, ServiceDescription, Dependencies, Tags,
    //   RecentDeployments, RecentErrors, GotchaHeuristics
    // Returns: { analysis: string } with probable cause, evidence, suggested actions, related services
    // Full implementation → article 7
});

// --- Infrastructure Hub: Terraform scaffolding (devops-02) ---
app.MapPost("/api/scaffold-terraform", async (ScaffoldTerraformRequest request, IConfiguration config) =>
{
    // Validates request.Cloud against supported providers (azure, scaleway, aws, gcp)
    // Creates ChatClient from config
    // System prompt: "Generate Terraform module with main.tf, variables.tf, outputs.tf"
    // Returns: ScaffoldTerraformResult { Main, Variables, Outputs }
    // Full implementation → devops-02
});

app.Run();

// --- Record types ---

// Article 2
record EnrichRequest(List<SourceFile> Files);
record SourceFile(string Path, string Content);
record CatalogMetadata(
    [property: JsonPropertyName("description")] string Description,
    [property: JsonPropertyName("tags")] List<string> Tags,
    [property: JsonPropertyName("dependencies")] List<string> Dependencies,
    [property: JsonPropertyName("apiEndpoints")] List<string> ApiEndpoints);

// Article 3
record ScaffoldRequest(string Description);
record ScaffoldResult(
    [property: JsonPropertyName("name")] string Name,
    [property: JsonPropertyName("description")] string Description,
    [property: JsonPropertyName("type")] string Type,
    [property: JsonPropertyName("dependencies")] Dictionary<string, bool> Dependencies,
    [property: JsonPropertyName("auth")] string Auth,
    [property: JsonPropertyName("kubernetes")] bool Kubernetes,
    [property: JsonPropertyName("nugetPackages")] List<string> NugetPackages,
    [property: JsonPropertyName("envVars")] List<string> EnvVars,
    [property: JsonPropertyName("gotchaPrompt")] JsonElement GotchaPrompt);

// Article 4
record ReviewRequest(
    string ServiceName, string ServiceDescription,
    string[] Tags, string[] Dependencies,
    string GotchaHeuristics, string PrTitle, string Diff);

// Article 5
record IndexDocRequest(string EntityRef, string DocPath, string Content);
record AskRequest(string Question, string? EntityRef);
record AskResponse(string Answer, SourceReference[] Sources);
record SourceReference(string EntityRef, string DocPath, float Similarity);

// Article 6
record PolicyUpdate(string Team, string Action, bool Enabled, int? MaxDailyCalls);

// Article 7
record IncidentRequest(
    string ServiceName, string ServiceDescription,
    string[] Dependencies, string[] Tags,
    string RecentDeployments, string RecentErrors,
    string GotchaHeuristics, string AlertTitle,
    string Severity, string StartedAt);

// Infrastructure Hub (devops-02)
record ScaffoldTerraformRequest(string Cloud, string Name, string Description);
record ScaffoldTerraformResult(string Main, string Variables, string Outputs);

static class SerializerOptions
{
    public static readonly JsonSerializerOptions Default = new()
    {
        PropertyNameCaseInsensitive = true,
    };
}

Every endpoint that calls the AI model uses the same ChatClient creation logic. The governance middleware (AiUsageLogger.Track() from article 6) wraps each AI call to log tokens and enforce policies automatically.

About security: These endpoints have no authentication, no input validation, and no rate limiting. That’s on purpose — this series focuses on functionality, not hardening. If you plan to deploy this, read the Security section below.

The Complete Database Schema

-- pgvector (article 5)
CREATE EXTENSION IF NOT EXISTS vector;

-- Doc chunks for RAG (article 5)
CREATE TABLE doc_chunks (
    id SERIAL PRIMARY KEY,
    entity_ref VARCHAR(255) NOT NULL,
    doc_path VARCHAR(500) NOT NULL,
    chunk_index INTEGER NOT NULL,
    content TEXT NOT NULL,
    embedding vector(3584) NOT NULL,  -- dimensions depend on the embedding model
    created_at TIMESTAMP DEFAULT NOW(),
    UNIQUE (entity_ref, doc_path, chunk_index)
);

CREATE INDEX ON doc_chunks
USING hnsw (embedding vector_cosine_ops);

-- AI usage log (article 6)
CREATE TABLE ai_usage_log (
    id SERIAL PRIMARY KEY,
    timestamp TIMESTAMP DEFAULT NOW(),
    action VARCHAR(50) NOT NULL,
    entity_ref VARCHAR(255),
    team VARCHAR(100),
    user_ref VARCHAR(255),
    input_tokens INTEGER DEFAULT 0,
    output_tokens INTEGER DEFAULT 0,
    model VARCHAR(100),
    duration_ms INTEGER DEFAULT 0,
    status VARCHAR(20) DEFAULT 'success',
    metadata JSONB DEFAULT '{}'
);

CREATE INDEX idx_usage_action ON ai_usage_log(action);
CREATE INDEX idx_usage_team ON ai_usage_log(team);
CREATE INDEX idx_usage_timestamp ON ai_usage_log(timestamp);

-- AI policies (article 6)
CREATE TABLE ai_policies (
    id SERIAL PRIMARY KEY,
    team VARCHAR(100),
    action VARCHAR(50) NOT NULL,
    enabled BOOLEAN DEFAULT true,
    max_daily_calls INTEGER,
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE (team, action)
);

The Backstage Configuration

The app-config.yaml connects Backstage to the AI service and configures the plugins. Here are the key sections that make everything work:

# app-config.yaml (relevant sections)

app:
  baseUrl: http://localhost:3456

backend:
  baseUrl: http://localhost:7009
  listen:
    port: 7009

# Proxy — all Backstage plugins call the AI service through this
proxy:
  endpoints:
    /ai-service:
      target: http://localhost:5100

# Plugin-specific config
catalogEnricher:
  aiServiceUrl: http://localhost:5100

forge:
  aiServiceUrl: http://localhost:5100

techdocs:
  builder: 'local'
  generator:
    runIn: 'local'
  publisher:
    type: 'local'

The proxy config is important — Backstage plugins don’t call the AI service directly. They go through proxy/ai-service, which the backend forwards to http://localhost:5100. This keeps the AI service URL in one place.

Kubernetes Deployment

The AI service deployment:

# k8s/ai-service.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: forge-ai-service
  labels:
    app: forge-ai-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: forge-ai-service
  template:
    metadata:
      labels:
        app: forge-ai-service
    spec:
      containers:
        - name: ai-service
          image: ghcr.io/victorZKov/forge-ai-service:latest
          ports:
            - containerPort: 5100
          env:
            - name: AI__Provider
              value: openai  # "openai" for Scaleway/Mistral/OpenAI, "azure" for Azure AI Foundry
            - name: AI__Endpoint
              valueFrom:
                secretKeyRef:
                  name: forge-secrets
                  key: AI_ENDPOINT
            - name: AI__Key
              valueFrom:
                secretKeyRef:
                  name: forge-secrets
                  key: AI_KEY
            - name: AI__ChatModel
              value: mistral-small-3.2-24b-instruct-2506
            - name: AI__EmbeddingModel
              value: bge-multilingual-gemma2
            - name: ConnectionStrings__Default
              valueFrom:
                secretKeyRef:
                  name: forge-secrets
                  key: POSTGRESQL_CONNECTION
          livenessProbe:
            httpGet:
              path: /healthz
              port: 5100
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /healthz
              port: 5100
            initialDelaySeconds: 5
            periodSeconds: 10
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
---
apiVersion: v1
kind: Service
metadata:
  name: forge-ai-service
spec:
  selector:
    app: forge-ai-service
  ports:
    - port: 5100
      targetPort: 5100

Backstage deployment:

# k8s/backstage.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: forge-backstage
  labels:
    app: forge-backstage
spec:
  replicas: 1
  selector:
    matchLabels:
      app: forge-backstage
  template:
    metadata:
      labels:
        app: forge-backstage
    spec:
      containers:
        - name: backstage
          image: ghcr.io/victorZKov/forge-backstage:latest
          ports:
            - containerPort: 7009
          env:
            - name: POSTGRES_HOST
              value: forge-postgresql
            - name: POSTGRES_PORT
              value: "5432"
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: forge-secrets
                  key: POSTGRES_USER
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: forge-secrets
                  key: POSTGRES_PASSWORD
            - name: GITHUB_TOKEN
              valueFrom:
                secretKeyRef:
                  name: forge-secrets
                  key: GITHUB_TOKEN
          livenessProbe:
            httpGet:
              path: /healthcheck
              port: 7009
            initialDelaySeconds: 30
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /healthcheck
              port: 7009
            initialDelaySeconds: 15
            periodSeconds: 10
          resources:
            requests:
              cpu: 200m
              memory: 512Mi
            limits:
              cpu: 1000m
              memory: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: forge-backstage
spec:
  selector:
    app: forge-backstage
  ports:
    - port: 7009
      targetPort: 7009

Environment Variables

The complete .env.example:

# OIDC Authentication (QuantumID, Entra ID, Keycloak, Auth0)
OIDC_METADATA_URL=https://auth.quantumapi.eu/.well-known/openid-configuration
OIDC_CLIENT_ID=your-client-id
OIDC_CLIENT_SECRET=your-client-secret
BACKEND_SECRET=change-this-in-production

# AI Provider
# AI_PROVIDER: "openai" for Scaleway/Mistral/OpenAI (default), "azure" for Azure AI Foundry
AI_PROVIDER=openai
AI_ENDPOINT=https://api.scaleway.ai/v1
AI_KEY=your-key
AI_CHAT_MODEL=mistral-small-3.2-24b-instruct-2506
AI_EMBEDDING_MODEL=bge-multilingual-gemma2

# Examples for other providers:
# Mistral AI:  AI_ENDPOINT=https://api.mistral.ai/v1  AI_CHAT_MODEL=mistral-large-latest
# OpenAI:      AI_ENDPOINT=https://api.openai.com/v1  AI_CHAT_MODEL=gpt-5
# Azure:       AI_PROVIDER=azure  AI_ENDPOINT=https://your-instance.openai.azure.com  AI_CHAT_MODEL=gpt-5

# PostgreSQL (shared between Backstage and AI service)
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=forge
POSTGRES_PASSWORD=your-password

# GitHub
GITHUB_TOKEN=ghp_your-token

# AI Service
AI_SERVICE_URL=http://localhost:5100

# Webhook secrets
AI_CODE_REVIEW_WEBHOOK_SECRET=your-webhook-secret

Provider Flexibility

The AI service uses two NuGet packages: OpenAI and Azure.AI.OpenAI. The AzureOpenAIClient extends OpenAIClient, so all endpoint code works with both — the only difference is how you create the client. Set AI:Provider to "azure" for Azure AI Foundry, or leave it as "openai" (default) for everything else.

Provider	`AI:Provider`	Endpoint	Chat model	Embedding model
Scaleway	`openai`	`https://api.scaleway.ai/v1`	`mistral-small-3.2-24b-instruct-2506`	`bge-multilingual-gemma2`
Mistral AI	`openai`	`https://api.mistral.ai/v1`	`mistral-large-latest`	`mistral-embed`
Azure AI Foundry	`azure`	`https://your-instance.openai.azure.com`	`gpt-5`, `claude-sonnet-4-6`, `mistral-large`	`text-embedding-3-small`
OpenAI	`openai`	`https://api.openai.com/v1`	`gpt-5`	`text-embedding-3-small`

Azure AI Foundry is the most flexible option — it hosts models from OpenAI, Anthropic, Mistral, and others in a single endpoint. You can run Claude Sonnet or Mistral Large through Azure without managing separate API keys per provider. The Forge project runs on Scaleway with Mistral Large, but Azure AI Foundry is a solid choice if your organization already uses Azure.

The Backstage Backend Registration

All plugins registered in one place:

// packages/backend/src/index.ts
import { createBackend } from '@backstage/backend-defaults';

const backend = createBackend();

// Core plugins (standard Backstage)
backend.add(import('@backstage/plugin-app-backend'));
backend.add(import('@backstage/plugin-catalog-backend'));
backend.add(import('@backstage/plugin-catalog-backend-module-scaffolder-entity-model'));
backend.add(import('@backstage/plugin-catalog-backend-module-logs'));
backend.add(import('@backstage/plugin-scaffolder-backend'));
backend.add(import('@backstage/plugin-scaffolder-backend-module-github'));
backend.add(import('@backstage/plugin-techdocs-backend'));
backend.add(import('@backstage/plugin-auth-backend'));
backend.add(import('@backstage/plugin-auth-backend-module-guest-provider'));
backend.add(import('@backstage/plugin-proxy-backend'));

// Search (PostgreSQL-backed)
backend.add(import('@backstage/plugin-search-backend'));
backend.add(import('@backstage/plugin-search-backend-module-pg'));
backend.add(import('@backstage/plugin-search-backend-module-catalog'));
backend.add(import('@backstage/plugin-search-backend-module-techdocs'));

// Permissions
backend.add(import('@backstage/plugin-permission-backend'));
backend.add(import('@backstage/plugin-permission-backend-module-allow-all-policy'));

// Kubernetes
backend.add(import('@backstage/plugin-kubernetes-backend'));

// OIDC auth (QuantumID, Entra ID, Keycloak, Auth0)
backend.add(import('./modules/auth'));

// AI modules — extend existing plugins
import { catalogEnricherModule } from '@internal/plugin-catalog-enricher-backend';
backend.add(catalogEnricherModule);                    // Article 2 (catalog module)

import { aiScaffoldModule } from './modules/aiScaffoldModule';
backend.add(aiScaffoldModule);                         // Article 3 (scaffolder module)

// AI plugins — standalone with own routes
import { aiCodeReviewPlugin } from '@internal/plugin-ai-code-review';
backend.add(aiCodeReviewPlugin);                       // Article 4

import { techDocsRagPlugin } from '@internal/plugin-techdocs-rag';
backend.add(techDocsRagPlugin);                        // Article 5

import { aiIncidentPlugin } from '@internal/plugin-ai-incident';
backend.add(aiIncidentPlugin);                         // Article 7

// Admin plugin
import { adminBackendPlugin } from '@internal/plugin-admin-backend';
backend.add(adminBackendPlugin);

backend.start();

The distinction matters: the catalog enricher and AI scaffolder are modules (createBackendModule) because they extend existing plugins (catalog and scaffolder). The code review, RAG, and incident plugins are standalone plugins (createBackendPlugin) because they have their own HTTP routes.

The governance dashboard (article 6) is a frontend-only plugin — it reads data from the AI service through the proxy and doesn’t need a backend plugin. It’s registered in App.tsx, not here.

How to Add a New AI Feature

Every AI feature in Forge follows the same pattern:

Add an endpoint to the AI service — A new app.MapPost in Program.cs. Takes context, calls the AI model, returns structured result.
Create a Backstage backend plugin — Calls the AI service endpoint. Reads catalog data for context. Runs on a schedule or responds to events.
Create a Backstage frontend component — A card, page, or widget that shows results to the developer.
Wrap with governance — Use AiUsageLogger.Track() around the AI call. Logs automatically. Policies automatically enforced.

Example: adding a “dependency vulnerability scanner” feature:

// 1. AI service endpoint
app.MapPost("/api/scan-deps", async (ScanRequest request, IConfiguration config) =>
{
    // Same pattern: create ChatClient from IConfiguration, system prompt with context
});

// 2. Backstage backend plugin
export const depScanPlugin = createBackendPlugin({
  pluginId: 'dep-scanner',
  register(env) {
    // Same pattern: schedule, iterate catalog, call AI service
  },
});

// 3. Frontend component
export const DepScanCard = () => {
  // Same pattern: fetch from proxy, render results
};

The architecture is the same every time. The AI changes. The plumbing doesn’t.

Cost Reality Check

With the governance dashboard from article 6, here’s what real usage looks like for a team of 20 developers:

Feature	Frequency	Tokens/call	Monthly cost
Catalog enrichment	24h cycle, ~30 services	~2K input, ~500 output	~$0.21
Scaffolding	~10 new services/month	~1K input, ~2K output	~$0.14
Code review	~200 PRs/month	~5K input, ~1K output	~$3.20
RAG queries	~500 questions/month	~3K input, ~800 output	~$5.40
Incident analysis	~5 incidents/month	~4K input, ~1K output	~$0.07

Total: ~$9/month for a 20-developer team. Less than one developer’s lunch.

Estimated with Mistral Small pricing on Scaleway Generative APIs. Embedding calls (RAG indexing) use bge-multilingual-gemma2 — negligible cost. If you switch to Claude Sonnet or GPT-5 via Azure AI Foundry, costs are higher but the architecture is the same.

The governance dashboard tracks this automatically. If costs grow, you see it. If one team is using too many RAG queries, you can set a daily limit.

Security: What’s Missing

We left security out on purpose. This series is about functionality — catalog enrichment, scaffolding, code review, RAG, governance, incident response. Every article focuses on making the feature work, not on hardening it.

But don’t ship this to production without fixing these:

No authentication on AI endpoints — Anyone with network access can call /api/enrich, /api/scaffold, or any other endpoint. There’s no JWT validation, no API key check, nothing.
No input validation — A user could send a crafted prompt in the Description field that makes the AI do something unexpected. Prompt injection is real.
No rate limiting — A simple loop calling /api/ask repeatedly could burn your AI provider budget in minutes. The governance policies only block after the daily limit is reached — they don’t throttle requests.
No PII scrubbing — Source code, error logs, and incident data go directly to the AI provider. If that data contains personal information, it leaves your infrastructure.

The AI in Production series covers all of this:

Governance — PII scrubbing, consent middleware, audit logging
Cost Control — Rate limiting per user, input token guards
Designing for Failure — Circuit breakers, timeouts, fallbacks
Production Readiness — Complete checklist before going live

Same .NET service, same architecture. The production series adds the layers you need before real users touch it.

The Series

Article	What it does	Plugin
1. Why Your IDP Doesn’t Help	Backstage setup + catalog + static template	—
2. Teaching Your Catalog to Think	AI reads code, updates catalog metadata	catalog-enricher-backend
3. AI-Powered Software Templates	Scaffolder that understands natural language + GOTCHA.md	ai-scaffolder
4. The AI Code Review Plugin	PR review with catalog context + GOTCHA heuristics	ai-code-review
5. TechDocs RAG	Vector search over platform documentation	techdocs-rag
6. The AI Governance Dashboard	Usage tracking, cost estimation, policy control	ai-governance
7. AI-Assisted Incident Response	Automatic incident diagnosis from catalog + logs	ai-incident
8. The Reference Architecture	This article — how it all fits together	—

Troubleshooting

Common issues when setting up Forge:

AI service returns 502 — Endpoint or API key wrong. Check AI:Endpoint and AI:Key in appsettings.json. Try calling /healthz directly: curl http://localhost:5100/healthz.
Backstage can’t reach AI service — Check proxy config in app-config.yaml points to http://localhost:5100. The proxy endpoint should be /ai-service.
pgvector extension not found — Run CREATE EXTENSION IF NOT EXISTS vector; in your PostgreSQL database before creating the doc_chunks table.
CORS errors in browser — Frontend runs on 3456, backend on 7009, AI service on 5100. All three need to be in the CORS config in Program.cs.
Catalog enricher not running — Check catalogEnricher.aiServiceUrl in app-config.yaml. The module runs on a schedule — check Backstage logs for errors.
TechDocs RAG returns empty — Index documents first with /api/index-doc before querying with /api/ask. No index = no results.
Search not working — You need @backstage/plugin-search-backend-module-pg for PostgreSQL-backed search. The default in-memory search doesn’t persist.

What’s Next

Forge is a starting point. The architecture supports any AI feature that follows the same pattern: context from the catalog, intelligence from the model, control from governance.

Your services are cataloged and AI-enhanced. Now do the same for your infrastructure. The next series — Infrastructure Hub — extends Forge to manage Terraform modules, cloud resources, and multi-tenant environments. Same Backstage, same AI service, same philosophy.

The idea: register Terraform modules as catalog entities, generate golden-path modules with AI (using the same /api/scaffold-terraform endpoint you already have), and manage infrastructure for internal teams and MSP clients from one Backstage instance. Stay tuned.

The code is on GitHub: victorZKov/forge. Each article has a corresponding tag (article-01 through article-08).

And if you want to structure your AI prompts for better results — not just in the IDP, but in any AI-assisted development — check out the ATLAS + GOTCHA series. That’s where the GOTCHA prompt format used throughout this series comes from.

If this series helps you, consider buying me a coffee.

This is article 8 — the final article in the AI-Native IDP series. Previous: AI-Assisted Incident Response.