TechDocs RAG — AI That Reads Your Docs

The Problem

A developer asks: “What’s the retry policy for Azure Service Bus in our platform?”

The answer is in the TechDocs. Page 14 of the platform guidelines, section 3.2, written by the infrastructure team six months ago. But the developer doesn’t know it’s there. They don’t even know TechDocs has a search function. So they ask in Slack. Someone from the infra team answers three hours later. Or doesn’t answer at all.

This happens every day. The documentation exists. Nobody reads it. Not because developers are lazy — because finding the right paragraph in the right document takes longer than asking a colleague. And asking a colleague takes longer than guessing and moving on.

In article 4 we gave the code review plugin context from the catalog and GOTCHA.md. But that context is limited to what’s in the service metadata. The platform has more knowledge: architecture decisions, retry policies, naming conventions, security guidelines, deployment procedures. All of it sits in TechDocs, unread.

What if the AI could search those docs?

The Solution

Retrieval-Augmented Generation (RAG). Instead of putting all the documentation into the AI prompt (impossible — too large), we:

Split all TechDocs pages into small chunks
Generate vector embeddings for each chunk using the AI model
Store the embeddings in PostgreSQL with pgvector
When someone asks a question, embed the question, find the most similar chunks, and include them in the AI prompt

The AI answers with information from the actual documentation, not from its general training data. And it can cite which document the answer came from.

Developer asks question
       ↓
Embed the question (bge-multilingual-gemma2 or text-embedding-3-small)
       ↓
Search pgvector for similar chunks (cosine similarity)
       ↓
Top 5 chunks → added to the prompt as context
       ↓
AI model generates answer using those chunks
       ↓
Response with citations

Execute

The Vector Store

We use the same PostgreSQL database that Backstage uses, with the pgvector extension:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create the docs chunks table
CREATE TABLE doc_chunks (
    id SERIAL PRIMARY KEY,
    entity_ref VARCHAR(255) NOT NULL,
    doc_path VARCHAR(500) NOT NULL,
    chunk_index INTEGER NOT NULL,
    content TEXT NOT NULL,
    embedding vector(3584) NOT NULL,  -- dimensions depend on the embedding model
    created_at TIMESTAMP DEFAULT NOW(),

    UNIQUE (entity_ref, doc_path, chunk_index)
);

-- HNSW index for similarity search (works on empty tables, better recall)
CREATE INDEX ON doc_chunks
USING hnsw (embedding vector_cosine_ops);

The entity_ref links each chunk to a Backstage entity (e.g., component:default/invoice-api). This lets us search docs for a specific service or across the entire platform.

The Indexing Endpoint

The AI service gets a new endpoint that takes a document, splits it into chunks, and stores the embeddings:

app.MapPost("/api/index-doc", async (IndexDocRequest request, IConfiguration config) =>
{
    if (string.IsNullOrWhiteSpace(request.Content))
        return Results.BadRequest(new { error = "Content is required." });

    var connStr = config["Rag:PostgresConnection"];
    if (string.IsNullOrEmpty(connStr))
        return Results.Json(new { error = "RAG not configured (Rag:PostgresConnection missing)." }, statusCode: 503);

    var endpoint = config["AI:Endpoint"];
    var apiKey = config["AI:Key"];
    var embeddingModel = config["AI:EmbeddingModel"] ?? "bge-multilingual-gemma2";

    var openAiClient = new OpenAIClient(
        new ApiKeyCredential(apiKey!),
        new OpenAIClientOptions { Endpoint = new Uri(endpoint!) });
    var embeddingClient = openAiClient.GetEmbeddingClient(embeddingModel);

    var chunks = SplitIntoChunks(request.Content, maxChars: 2000);

    await using var dataSource = NpgsqlDataSource.Create(connStr);

    for (var i = 0; i < chunks.Count; i++)
    {
        var embedding = await embeddingClient.GenerateEmbeddingAsync(chunks[i]);
        var vector = embedding.Value.ToFloats();
        var vectorStr = "[" + string.Join(",",
            vector.ToArray().Select(f => f.ToString("G"))) + "]";

        await using var cmd = dataSource.CreateCommand();
        cmd.CommandText = """
            INSERT INTO doc_chunks (entity_ref, doc_path, chunk_index, content, embedding)
            VALUES ($1, $2, $3, $4, $5::vector)
            ON CONFLICT (entity_ref, doc_path, chunk_index)
            DO UPDATE SET content = EXCLUDED.content, embedding = EXCLUDED.embedding
            """;

        cmd.Parameters.AddWithValue(request.EntityRef);
        cmd.Parameters.AddWithValue(request.DocPath);
        cmd.Parameters.AddWithValue(i);
        cmd.Parameters.AddWithValue(chunks[i]);
        cmd.Parameters.AddWithValue(vectorStr);

        await cmd.ExecuteNonQueryAsync();
    }

    return Results.Ok(new { chunksIndexed = chunks.Count });
});

record IndexDocRequest(string EntityRef, string DocPath, string Content);

static List<string> SplitIntoChunks(string text, int maxChars)
{
    var chunks = new List<string>();
    var paragraphs = text.Split("\n\n", StringSplitOptions.RemoveEmptyEntries);
    var current = new System.Text.StringBuilder();

    foreach (var paragraph in paragraphs)
    {
        if (current.Length + paragraph.Length > maxChars && current.Length > 0)
        {
            chunks.Add(current.ToString().Trim());
            current.Clear();
        }
        current.AppendLine(paragraph);
        current.AppendLine();
    }

    if (current.Length > 0)
        chunks.Add(current.ToString().Trim());

    return chunks;
}

The EmbeddingModel defaults to bge-multilingual-gemma2 on Scaleway Generative APIs, which produces 3584-dimension vectors. If you use OpenAI, set it to text-embedding-3-small (1536 dimensions) and adjust the vector() column size in the SQL schema. The endpoint returns 503 when PostgreSQL is not configured — the RAG features are optional.

The Search Endpoint

When someone asks a question, we embed it and search for similar chunks:

app.MapPost("/api/ask", async (AskRequest request, IConfiguration config) =>
{
    if (string.IsNullOrWhiteSpace(request.Question))
        return Results.BadRequest(new { error = "Question is required." });

    var connStr = config["Rag:PostgresConnection"];
    if (string.IsNullOrEmpty(connStr))
        return Results.Json(new { error = "RAG not configured (Rag:PostgresConnection missing)." }, statusCode: 503);

    var endpoint = config["AI:Endpoint"];
    var apiKey = config["AI:Key"];
    var model = config["AI:ChatModel"] ?? "mistral-small-3.2-24b-instruct-2506";
    var embeddingModel = config["AI:EmbeddingModel"] ?? "bge-multilingual-gemma2";
    var provider = config["AI:Provider"] ?? "openai";

    var openAiClient = new OpenAIClient(
        new ApiKeyCredential(apiKey!),
        new OpenAIClientOptions { Endpoint = new Uri(endpoint!) });
    var embeddingClient = openAiClient.GetEmbeddingClient(embeddingModel);

    ChatClient chatClient = provider.ToLowerInvariant() switch
    {
        "azure" => new AzureOpenAIClient(
            new Uri(endpoint!), new ApiKeyCredential(apiKey!))
            .GetChatClient(model),
        _ => openAiClient.GetChatClient(model),
    };

    // 1. Embed the question
    var questionEmbedding = await embeddingClient.GenerateEmbeddingAsync(request.Question);
    var vector = questionEmbedding.Value.ToFloats();
    var vectorStr = "[" + string.Join(",",
        vector.ToArray().Select(f => f.ToString("G"))) + "]";

    // 2. Search for similar chunks
    await using var dataSource = NpgsqlDataSource.Create(connStr);
    await using var cmd = dataSource.CreateCommand();
    cmd.CommandText = """
        SELECT entity_ref, doc_path, content,
               1 - (embedding <=> $1::vector) AS similarity
        FROM doc_chunks
        WHERE ($2 = '' OR entity_ref = $2)
        ORDER BY embedding <=> $1::vector
        LIMIT 5
        """;

    cmd.Parameters.AddWithValue(vectorStr);
    cmd.Parameters.AddWithValue(request.EntityRef ?? "");

    var contexts = new List<DocContext>();
    await using var reader = await cmd.ExecuteReaderAsync();
    while (await reader.ReadAsync())
    {
        contexts.Add(new DocContext(
            reader.GetString(0),
            reader.GetString(1),
            reader.GetString(2),
            reader.GetFloat(3)));
    }

    if (contexts.Count == 0)
    {
        return Results.Ok(new AskResponse(
            "I couldn't find any relevant documentation for your question.",
            Array.Empty<SourceReference>()));
    }

    // 3. Build prompt with retrieved context
    var contextBlock = string.Join("\n\n",
        contexts.Select(c =>
            $"[Source: {c.DocPath} ({c.EntityRef})]\n{c.Content}"));

    try
    {
        var completion = await chatClient.CompleteChatAsync(
        [
            new SystemChatMessage($"""
                You are a platform documentation assistant.
                Answer the question using ONLY the documentation excerpts provided below.
                If the answer is not in the documentation, say so — do not make things up.
                Always cite the source document for each fact you reference.

                DOCUMENTATION:
                {contextBlock}
                """),
            new UserChatMessage(request.Question),
        ]);

        var answer = completion.Value.Content[0].Text.Trim();
        var sources = contexts
            .Select(c => new SourceReference(c.EntityRef, c.DocPath, c.Similarity))
            .ToArray();

        return Results.Ok(new AskResponse(answer, sources));
    }
    catch (ClientResultException ex) when (ex.Status == 401)
    {
        return Results.Json(new { error = "AI provider authentication failed." }, statusCode: 503);
    }
    catch (Exception ex)
    {
        return Results.Json(new { error = $"AI provider error: {ex.Message}" }, statusCode: 502);
    }
});

record AskRequest(string Question, string? EntityRef);
record AskResponse(string Answer, SourceReference[] Sources);
record SourceReference(string EntityRef, string DocPath, float Similarity);
record DocContext(string EntityRef, string DocPath, string Content, float Similarity);

The EntityRef parameter is optional. If provided, the search is scoped to docs for that specific service. If empty, it searches across all platform documentation.

The TechDocs Indexer Plugin

A Backstage backend plugin that watches for TechDocs updates and indexes them:

// plugins/techdocs-rag/src/plugin.ts
import {
  coreServices,
  createBackendPlugin,
} from '@backstage/backend-plugin-api';
import { catalogServiceRef } from '@backstage/plugin-catalog-node';
import { indexEntityDocs } from './indexer';

export const techDocsRagPlugin = createBackendPlugin({
  pluginId: 'techdocs-rag',
  register(env) {
    env.registerInit({
      deps: {
        logger: coreServices.logger,
        scheduler: coreServices.scheduler,
        config: coreServices.rootConfig,
        catalog: catalogServiceRef,
        auth: coreServices.auth,
      },
      async init({ logger, scheduler, config, catalog, auth }) {
        const aiServiceUrl = config.getString('forge.aiServiceUrl');

        await scheduler.scheduleTask({
          id: 'techdocs-rag-indexer',
          frequency: { hours: 6 },
          timeout: { minutes: 30 },
          initialDelay: { seconds: 60 },
          fn: async () => {
            logger.info('Starting TechDocs indexing');

            const credentials = await auth.getOwnServiceCredentials();
            const { items: entities } = await catalog.getEntities(
              { filter: { kind: 'Component' } },
              { credentials },
            );

            let indexed = 0, skipped = 0, failed = 0;

            for (const entity of entities) {
              try {
                const count = await indexEntityDocs({
                  entity, aiServiceUrl, logger,
                });
                count > 0 ? indexed++ : skipped++;
              } catch (err) {
                failed++;
                logger.error(
                  `Failed to index docs for ${entity.metadata.name}: ${err}`,
                );
              }
            }

            logger.info(
              `TechDocs indexing complete: ${indexed} indexed, ${skipped} skipped, ${failed} failed`,
            );
          },
        });
      },
    });
  },
});

The indexer reads TechDocs from each entity’s repo:

// plugins/techdocs-rag/src/indexer.ts
import type { Entity } from '@backstage/catalog-model';
import type { LoggerService } from '@backstage/backend-plugin-api';
import { Octokit } from '@octokit/rest';

interface IndexOptions {
  entity: Entity;
  aiServiceUrl: string;
  logger: LoggerService;
}

export async function indexEntityDocs({
  entity, aiServiceUrl, logger,
}: IndexOptions): Promise<number> {
  const slug =
    entity.metadata.annotations?.['github.com/project-slug'];
  if (!slug) return 0;

  const [owner, repo] = slug.split('/');
  const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });
  const entityRef = `component:default/${entity.metadata.name}`;

  let tree;
  try {
    const { data } = await octokit.git.getTree({
      owner, repo, tree_sha: 'main', recursive: 'true',
    });
    tree = data.tree;
  } catch {
    return 0;
  }

  const docFiles = tree.filter(
    f =>
      f.type === 'blob' &&
      f.path &&
      (f.path.startsWith('docs/') || f.path === 'GOTCHA.md') &&
      f.path.endsWith('.md'),
  );

  if (docFiles.length === 0) return 0;

  logger.info(
    `Indexing ${docFiles.length} docs for ${entity.metadata.name}`,
  );

  let indexed = 0;
  for (const file of docFiles) {
    if (!file.path) continue;

    try {
      const { data: content } = await octokit.repos.getContent({
        owner, repo, path: file.path,
        mediaType: { format: 'raw' },
      });

      const res = await fetch(`${aiServiceUrl}/api/index-doc`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          entityRef,
          docPath: file.path,
          content: content as unknown as string,
        }),
      });

      if (res.ok) indexed++;
    } catch (err) {
      logger.info(`Could not index ${file.path}: ${err}`);
    }
  }

  return indexed;
}

Note that the indexer also indexes GOTCHA.md. This means when a developer asks a question about a service, the AI can draw from both the TechDocs and the GOTCHA prompt.

A simple component that lets developers ask questions from the Backstage UI:

// plugins/techdocs-rag/src/components/AskWidget.tsx
import React, { useState } from 'react';
import {
  Card,
  CardContent,
  CardHeader,
  TextField,
  Button,
  Typography,
  Chip,
  Box,
  CircularProgress,
} from '@material-ui/core';
import { useEntity } from '@backstage/plugin-catalog-react';
import { useApi, fetchApiRef, discoveryApiRef } from '@backstage/core-plugin-api';

export const AskWidget = () => {
  const { entity } = useEntity();
  const fetchApi = useApi(fetchApiRef);
  const discoveryApi = useApi(discoveryApiRef);
  const [question, setQuestion] = useState('');
  const [answer, setAnswer] = useState<string | null>(null);
  const [sources, setSources] = useState<
    { entityRef: string; docPath: string; similarity: number }[]
  >([]);
  const [loading, setLoading] = useState(false);

  const entityRef = `component:default/${entity.metadata.name}`;

  const handleAsk = async () => {
    if (!question.trim()) return;
    setLoading(true);
    setAnswer(null);

    try {
      const proxyUrl = await discoveryApi.getBaseUrl('proxy');
      const res = await fetchApi.fetch(`${proxyUrl}/ai-service/api/ask`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          question,
          entityRef,
        }),
      });

      const data = await res.json();
      setAnswer(data.answer);
      setSources(data.sources);
    } catch (err) {
      setAnswer('Failed to get an answer. Please try again.');
    } finally {
      setLoading(false);
    }
  };

  return (
    <Card>
      <CardHeader title="Ask about this service" />
      <CardContent>
        <Box display="flex" gap={1} mb={2}>
          <TextField
            fullWidth
            variant="outlined"
            placeholder="What's the retry policy for Service Bus?"
            value={question}
            onChange={e => setQuestion(e.target.value)}
            onKeyDown={e => e.key === 'Enter' && handleAsk()}
          />
          <Button
            variant="contained"
            color="primary"
            onClick={handleAsk}
            disabled={loading}
          >
            {loading ? <CircularProgress size={24} /> : 'Ask'}
          </Button>
        </Box>

        {answer && (
          <Box mt={2}>
            <Typography variant="body1" style={{ whiteSpace: 'pre-wrap' }}>
              {answer}
            </Typography>

            {sources.length > 0 && (
              <Box mt={1}>
                <Typography variant="caption" color="textSecondary">
                  Sources:
                </Typography>
                <Box display="flex" gap={0.5} flexWrap="wrap" mt={0.5}>
                  {sources.map((s, i) => (
                    <Chip
                      key={i}
                      label={s.docPath}
                      size="small"
                      variant="outlined"
                    />
                  ))}
                </Box>
              </Box>
            )}
          </Box>
        )}
      </CardContent>
    </Card>
  );
};

In the Backstage app, add the widget to the entity page:

// packages/app/src/components/catalog/EntityPage.tsx
import { AskWidget } from '@internal/plugin-techdocs-rag';

// Inside the service entity page layout:
<Grid item md={6}>
  <AskWidget />
</Grid>

Now when a developer opens any service in the catalog, they see an “Ask about this service” card. They type a question, and the AI answers using the service’s documentation.

Registering the Plugin

In packages/backend/src/index.ts:

import { techDocsRagPlugin } from '@internal/plugin-techdocs-rag';

backend.add(techDocsRagPlugin);

The plugin reads forge.aiServiceUrl from app-config.yaml (same config as the other plugins). The RAG endpoints (/api/index-doc, /api/ask) return 503 when PostgreSQL is not configured, so the indexer runs safely in development without pgvector — it just logs that indexing was skipped.

Connecting RAG to Code Review

The best part: the RAG system can feed into the code review from article 4. Before reviewing a PR, the plugin can search for relevant documentation and include it in the review context:

// In review.ts, before calling the AI service:

// Search for docs related to the changed files
const changedAreas = codeFiles
  .map(f => f.filename.split('/')[0])
  .filter((v, i, a) => a.indexOf(v) === i)
  .join(' ');

const docsRes = await fetch(`${aiServiceUrl}/api/ask`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    question: `Guidelines and rules for ${changedAreas} in ${serviceName}`,
    entityRef: `component:default/${serviceName}`,
  }),
});

const docsContext = await docsRes.json();
// Add docsContext.answer to the review prompt

Now the code review knows the service’s retry policy, naming conventions, and deployment rules — not because someone wrote them in GOTCHA.md, but because they’re in the TechDocs and the RAG system found them.

Proxy Configuration

To let the Backstage frontend talk to the AI service, add a proxy in app-config.yaml:

proxy:
  endpoints:
    /ai-service:
      target: http://localhost:5100
      allowedHeaders: ['Content-Type']

Checklist

pgvector extension enabled in PostgreSQL
doc_chunks table created with vector index
/api/index-doc endpoint splits and embeds documents
/api/ask endpoint performs vector search and generates answers
TechDocs indexer plugin runs on schedule (every 6 hours)
GOTCHA.md included in the indexed documents
AskWidget component renders on the entity page
Proxy configured for frontend-to-AI-service communication
AI answers include source citations

Before the Next Article

Developers can now ask questions and get answers from the platform documentation. The code review plugin can search the docs before reviewing a PR. The AI has moved from “knows what the catalog says” to “knows what the documentation says.”

But who controls what the AI can and can’t do? Who decides which services can use the scaffolder? Who tracks whether the AI review findings are useful or noisy? Who monitors costs?

The platform needs governance. Not bureaucratic governance — visible, automated governance. A dashboard where the platform team sees what AI features are being used, by whom, how much they cost, and whether they’re helping.

That’s article 6: The AI Governance Dashboard.

Troubleshooting

TechDocs fails with “Docker does not appear to be available”

TechDocs uses Docker to build documentation with mkdocs. If you use OrbStack instead of Docker Desktop, Backstage may not find the socket. Fix it by setting DOCKER_HOST:

export DOCKER_HOST=unix:///var/run/docker.sock

Or configure TechDocs to use a local mkdocs installation instead of Docker. In app-config.yaml:

techdocs:
  generator:
    runIn: local

Then install mkdocs locally: pip install mkdocs mkdocs-techdocs-core

Vector dimension mismatch

If you get expected N dimensions, not M, the embedding model returns a different vector size than your table schema. Check your model’s output dimensions and update the vector(N) column accordingly. For example, bge-multilingual-gemma2 on Scaleway returns 3584 dimensions, not 768.

better-sqlite3 fails to build on Node 24+

Backstage’s default database is better-sqlite3, which requires native compilation. If it fails, switch to PostgreSQL in app-config.yaml:

backend:
  database:
    client: pg
    connection:
      host: localhost
      port: 5432
      user: postgres
      password: your-password

If this series helps you, consider buying me a coffee.

This is article 5 of the AI-Native IDP series. Next: The AI Governance Dashboard — visibility and control for AI features in your platform.