TechDocs RAG — IA Que Lee Tu Documentacion

El Problema

Un desarrollador pregunta: “Cual es la retry policy para Azure Service Bus en nuestra plataforma?”

La respuesta esta en los TechDocs. Pagina 14 de las guias de plataforma, seccion 3.2, escrita por el equipo de infraestructura hace seis meses. Pero el desarrollador no sabe que esta ahi. Ni siquiera sabe que TechDocs tiene una funcion de busqueda. Asi que pregunta en Slack. Alguien del equipo de infra responde tres horas despues. O no responde.

Esto pasa cada dia. La documentacion existe. Nadie la lee. No porque los desarrolladores sean vagos — porque encontrar el parrafo correcto en el documento correcto tarda mas que preguntar a un companyero. Y preguntar a un companyero tarda mas que adivinar y seguir adelante.

En el articulo 4 le dimos al plugin de code review contexto del catalogo y de GOTCHA.md. Pero ese contexto esta limitado a lo que hay en los metadatos del servicio. La plataforma tiene mas conocimiento: decisiones de arquitectura, retry policies, convenciones de nombres, guias de seguridad, procedimientos de despliegue. Todo eso esta en TechDocs, sin leer.

Y si la IA pudiera buscar en esos documentos?

La Solucion

Retrieval-Augmented Generation (RAG). En vez de meter toda la documentacion en el prompt de la IA (imposible — demasiado grande), hacemos esto:

Dividimos todas las paginas de TechDocs en trozos pequenyos (chunks)
Generamos vector embeddings para cada chunk usando el modelo de IA
Almacenamos los embeddings en PostgreSQL con pgvector
Cuando alguien hace una pregunta, generamos el embedding de la pregunta, buscamos los chunks mas similares y los incluimos en el prompt de la IA

La IA responde con informacion de la documentacion real, no de sus datos de entrenamiento generales. Y puede citar de que documento viene la respuesta.

Developer asks question
       ↓
Embed the question (bge-multilingual-gemma2 or text-embedding-3-small)
       ↓
Search pgvector for similar chunks (cosine similarity)
       ↓
Top 5 chunks → added to the prompt as context
       ↓
AI model generates answer using those chunks
       ↓
Response with citations

Ejecucion

El Vector Store

Usamos la misma base de datos PostgreSQL que usa Backstage, con la extension pgvector:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create the docs chunks table
CREATE TABLE doc_chunks (
    id SERIAL PRIMARY KEY,
    entity_ref VARCHAR(255) NOT NULL,
    doc_path VARCHAR(500) NOT NULL,
    chunk_index INTEGER NOT NULL,
    content TEXT NOT NULL,
    embedding vector(3584) NOT NULL,  -- dimensions depend on the embedding model
    created_at TIMESTAMP DEFAULT NOW(),

    UNIQUE (entity_ref, doc_path, chunk_index)
);

-- HNSW index for similarity search (works on empty tables, better recall)
CREATE INDEX ON doc_chunks
USING hnsw (embedding vector_cosine_ops);

El entity_ref vincula cada chunk a una entidad de Backstage (por ejemplo, component:default/invoice-api). Esto nos permite buscar documentos de un servicio concreto o de toda la plataforma.

El Endpoint de Indexacion

El servicio de IA recibe un nuevo endpoint que toma un documento, lo divide en chunks y almacena los embeddings:

app.MapPost("/api/index-doc", async (IndexDocRequest request, IConfiguration config) =>
{
    if (string.IsNullOrWhiteSpace(request.Content))
        return Results.BadRequest(new { error = "Content is required." });

    var connStr = config["Rag:PostgresConnection"];
    if (string.IsNullOrEmpty(connStr))
        return Results.Json(new { error = "RAG not configured (Rag:PostgresConnection missing)." }, statusCode: 503);

    var endpoint = config["AI:Endpoint"];
    var apiKey = config["AI:Key"];
    var embeddingModel = config["AI:EmbeddingModel"] ?? "bge-multilingual-gemma2";

    var openAiClient = new OpenAIClient(
        new ApiKeyCredential(apiKey!),
        new OpenAIClientOptions { Endpoint = new Uri(endpoint!) });
    var embeddingClient = openAiClient.GetEmbeddingClient(embeddingModel);

    var chunks = SplitIntoChunks(request.Content, maxChars: 2000);

    await using var dataSource = NpgsqlDataSource.Create(connStr);

    for (var i = 0; i < chunks.Count; i++)
    {
        var embedding = await embeddingClient.GenerateEmbeddingAsync(chunks[i]);
        var vector = embedding.Value.ToFloats();
        var vectorStr = "[" + string.Join(",",
            vector.ToArray().Select(f => f.ToString("G"))) + "]";

        await using var cmd = dataSource.CreateCommand();
        cmd.CommandText = """
            INSERT INTO doc_chunks (entity_ref, doc_path, chunk_index, content, embedding)
            VALUES ($1, $2, $3, $4, $5::vector)
            ON CONFLICT (entity_ref, doc_path, chunk_index)
            DO UPDATE SET content = EXCLUDED.content, embedding = EXCLUDED.embedding
            """;

        cmd.Parameters.AddWithValue(request.EntityRef);
        cmd.Parameters.AddWithValue(request.DocPath);
        cmd.Parameters.AddWithValue(i);
        cmd.Parameters.AddWithValue(chunks[i]);
        cmd.Parameters.AddWithValue(vectorStr);

        await cmd.ExecuteNonQueryAsync();
    }

    return Results.Ok(new { chunksIndexed = chunks.Count });
});

record IndexDocRequest(string EntityRef, string DocPath, string Content);

static List<string> SplitIntoChunks(string text, int maxChars)
{
    var chunks = new List<string>();
    var paragraphs = text.Split("\n\n", StringSplitOptions.RemoveEmptyEntries);
    var current = new System.Text.StringBuilder();

    foreach (var paragraph in paragraphs)
    {
        if (current.Length + paragraph.Length > maxChars && current.Length > 0)
        {
            chunks.Add(current.ToString().Trim());
            current.Clear();
        }
        current.AppendLine(paragraph);
        current.AppendLine();
    }

    if (current.Length > 0)
        chunks.Add(current.ToString().Trim());

    return chunks;
}

El EmbeddingModel por defecto es bge-multilingual-gemma2 en Scaleway Generative APIs, que produce vectores de 3584 dimensiones. Si usas OpenAI, cambialo a text-embedding-3-small (1536 dimensiones) y ajusta el tamanyo de la columna vector() en el esquema SQL. El endpoint devuelve 503 cuando PostgreSQL no esta configurado — las funcionalidades de RAG son opcionales.

El Endpoint de Busqueda

Cuando alguien hace una pregunta, generamos el embedding y buscamos chunks similares:

app.MapPost("/api/ask", async (AskRequest request, IConfiguration config) =>
{
    if (string.IsNullOrWhiteSpace(request.Question))
        return Results.BadRequest(new { error = "Question is required." });

    var connStr = config["Rag:PostgresConnection"];
    if (string.IsNullOrEmpty(connStr))
        return Results.Json(new { error = "RAG not configured (Rag:PostgresConnection missing)." }, statusCode: 503);

    var endpoint = config["AI:Endpoint"];
    var apiKey = config["AI:Key"];
    var model = config["AI:ChatModel"] ?? "mistral-small-3.2-24b-instruct-2506";
    var embeddingModel = config["AI:EmbeddingModel"] ?? "bge-multilingual-gemma2";
    var provider = config["AI:Provider"] ?? "openai";

    var openAiClient = new OpenAIClient(
        new ApiKeyCredential(apiKey!),
        new OpenAIClientOptions { Endpoint = new Uri(endpoint!) });
    var embeddingClient = openAiClient.GetEmbeddingClient(embeddingModel);

    ChatClient chatClient = provider.ToLowerInvariant() switch
    {
        "azure" => new AzureOpenAIClient(
            new Uri(endpoint!), new ApiKeyCredential(apiKey!))
            .GetChatClient(model),
        _ => openAiClient.GetChatClient(model),
    };

    // 1. Embed the question
    var questionEmbedding = await embeddingClient.GenerateEmbeddingAsync(request.Question);
    var vector = questionEmbedding.Value.ToFloats();
    var vectorStr = "[" + string.Join(",",
        vector.ToArray().Select(f => f.ToString("G"))) + "]";

    // 2. Search for similar chunks
    await using var dataSource = NpgsqlDataSource.Create(connStr);
    await using var cmd = dataSource.CreateCommand();
    cmd.CommandText = """
        SELECT entity_ref, doc_path, content,
               1 - (embedding <=> $1::vector) AS similarity
        FROM doc_chunks
        WHERE ($2 = '' OR entity_ref = $2)
        ORDER BY embedding <=> $1::vector
        LIMIT 5
        """;

    cmd.Parameters.AddWithValue(vectorStr);
    cmd.Parameters.AddWithValue(request.EntityRef ?? "");

    var contexts = new List<DocContext>();
    await using var reader = await cmd.ExecuteReaderAsync();
    while (await reader.ReadAsync())
    {
        contexts.Add(new DocContext(
            reader.GetString(0),
            reader.GetString(1),
            reader.GetString(2),
            reader.GetFloat(3)));
    }

    if (contexts.Count == 0)
    {
        return Results.Ok(new AskResponse(
            "I couldn't find any relevant documentation for your question.",
            Array.Empty<SourceReference>()));
    }

    // 3. Build prompt with retrieved context
    var contextBlock = string.Join("\n\n",
        contexts.Select(c =>
            $"[Source: {c.DocPath} ({c.EntityRef})]\n{c.Content}"));

    try
    {
        var completion = await chatClient.CompleteChatAsync(
        [
            new SystemChatMessage($"""
                You are a platform documentation assistant.
                Answer the question using ONLY the documentation excerpts provided below.
                If the answer is not in the documentation, say so — do not make things up.
                Always cite the source document for each fact you reference.

                DOCUMENTATION:
                {contextBlock}
                """),
            new UserChatMessage(request.Question),
        ]);

        var answer = completion.Value.Content[0].Text.Trim();
        var sources = contexts
            .Select(c => new SourceReference(c.EntityRef, c.DocPath, c.Similarity))
            .ToArray();

        return Results.Ok(new AskResponse(answer, sources));
    }
    catch (ClientResultException ex) when (ex.Status == 401)
    {
        return Results.Json(new { error = "AI provider authentication failed." }, statusCode: 503);
    }
    catch (Exception ex)
    {
        return Results.Json(new { error = $"AI provider error: {ex.Message}" }, statusCode: 502);
    }
});

record AskRequest(string Question, string? EntityRef);
record AskResponse(string Answer, SourceReference[] Sources);
record SourceReference(string EntityRef, string DocPath, float Similarity);
record DocContext(string EntityRef, string DocPath, string Content, float Similarity);

El parametro EntityRef es opcional. Si se proporciona, la busqueda se limita a los documentos de ese servicio concreto. Si esta vacio, busca en toda la documentacion de la plataforma.

El Plugin TechDocs Indexer

Un plugin de backend de Backstage que vigila las actualizaciones de TechDocs y las indexa:

// plugins/techdocs-rag/src/plugin.ts
import {
  coreServices,
  createBackendPlugin,
} from '@backstage/backend-plugin-api';
import { catalogServiceRef } from '@backstage/plugin-catalog-node';
import { indexEntityDocs } from './indexer';

export const techDocsRagPlugin = createBackendPlugin({
  pluginId: 'techdocs-rag',
  register(env) {
    env.registerInit({
      deps: {
        logger: coreServices.logger,
        scheduler: coreServices.scheduler,
        config: coreServices.rootConfig,
        catalog: catalogServiceRef,
        auth: coreServices.auth,
      },
      async init({ logger, scheduler, config, catalog, auth }) {
        const aiServiceUrl = config.getString('forge.aiServiceUrl');

        await scheduler.scheduleTask({
          id: 'techdocs-rag-indexer',
          frequency: { hours: 6 },
          timeout: { minutes: 30 },
          initialDelay: { seconds: 60 },
          fn: async () => {
            logger.info('Starting TechDocs indexing');

            const credentials = await auth.getOwnServiceCredentials();
            const { items: entities } = await catalog.getEntities(
              { filter: { kind: 'Component' } },
              { credentials },
            );

            let indexed = 0, skipped = 0, failed = 0;

            for (const entity of entities) {
              try {
                const count = await indexEntityDocs({
                  entity, aiServiceUrl, logger,
                });
                count > 0 ? indexed++ : skipped++;
              } catch (err) {
                failed++;
                logger.error(
                  `Failed to index docs for ${entity.metadata.name}: ${err}`,
                );
              }
            }

            logger.info(
              `TechDocs indexing complete: ${indexed} indexed, ${skipped} skipped, ${failed} failed`,
            );
          },
        });
      },
    });
  },
});

El indexer lee los TechDocs del repositorio de cada entidad:

// plugins/techdocs-rag/src/indexer.ts
import type { Entity } from '@backstage/catalog-model';
import type { LoggerService } from '@backstage/backend-plugin-api';
import { Octokit } from '@octokit/rest';

interface IndexOptions {
  entity: Entity;
  aiServiceUrl: string;
  logger: LoggerService;
}

export async function indexEntityDocs({
  entity, aiServiceUrl, logger,
}: IndexOptions): Promise<number> {
  const slug =
    entity.metadata.annotations?.['github.com/project-slug'];
  if (!slug) return 0;

  const [owner, repo] = slug.split('/');
  const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });
  const entityRef = `component:default/${entity.metadata.name}`;

  let tree;
  try {
    const { data } = await octokit.git.getTree({
      owner, repo, tree_sha: 'main', recursive: 'true',
    });
    tree = data.tree;
  } catch {
    return 0;
  }

  const docFiles = tree.filter(
    f =>
      f.type === 'blob' &&
      f.path &&
      (f.path.startsWith('docs/') || f.path === 'GOTCHA.md') &&
      f.path.endsWith('.md'),
  );

  if (docFiles.length === 0) return 0;

  logger.info(
    `Indexing ${docFiles.length} docs for ${entity.metadata.name}`,
  );

  let indexed = 0;
  for (const file of docFiles) {
    if (!file.path) continue;

    try {
      const { data: content } = await octokit.repos.getContent({
        owner, repo, path: file.path,
        mediaType: { format: 'raw' },
      });

      const res = await fetch(`${aiServiceUrl}/api/index-doc`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          entityRef,
          docPath: file.path,
          content: content as unknown as string,
        }),
      });

      if (res.ok) indexed++;
    } catch (err) {
      logger.info(`Could not index ${file.path}: ${err}`);
    }
  }

  return indexed;
}

Fijate que el indexer tambien indexa GOTCHA.md. Esto significa que cuando un desarrollador pregunta algo sobre un servicio, la IA puede usar tanto los TechDocs como el prompt de GOTCHA.

Un componente sencillo que permite a los desarrolladores hacer preguntas desde la interfaz de Backstage:

// plugins/techdocs-rag/src/components/AskWidget.tsx
import React, { useState } from 'react';
import {
  Card,
  CardContent,
  CardHeader,
  TextField,
  Button,
  Typography,
  Chip,
  Box,
  CircularProgress,
} from '@material-ui/core';
import { useEntity } from '@backstage/plugin-catalog-react';
import { useApi, fetchApiRef, discoveryApiRef } from '@backstage/core-plugin-api';

export const AskWidget = () => {
  const { entity } = useEntity();
  const fetchApi = useApi(fetchApiRef);
  const discoveryApi = useApi(discoveryApiRef);
  const [question, setQuestion] = useState('');
  const [answer, setAnswer] = useState<string | null>(null);
  const [sources, setSources] = useState<
    { entityRef: string; docPath: string; similarity: number }[]
  >([]);
  const [loading, setLoading] = useState(false);

  const entityRef = `component:default/${entity.metadata.name}`;

  const handleAsk = async () => {
    if (!question.trim()) return;
    setLoading(true);
    setAnswer(null);

    try {
      const proxyUrl = await discoveryApi.getBaseUrl('proxy');
      const res = await fetchApi.fetch(`${proxyUrl}/ai-service/api/ask`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          question,
          entityRef,
        }),
      });

      const data = await res.json();
      setAnswer(data.answer);
      setSources(data.sources);
    } catch (err) {
      setAnswer('Failed to get an answer. Please try again.');
    } finally {
      setLoading(false);
    }
  };

  return (
    <Card>
      <CardHeader title="Ask about this service" />
      <CardContent>
        <Box display="flex" gap={1} mb={2}>
          <TextField
            fullWidth
            variant="outlined"
            placeholder="What's the retry policy for Service Bus?"
            value={question}
            onChange={e => setQuestion(e.target.value)}
            onKeyDown={e => e.key === 'Enter' && handleAsk()}
          />
          <Button
            variant="contained"
            color="primary"
            onClick={handleAsk}
            disabled={loading}
          >
            {loading ? <CircularProgress size={24} /> : 'Ask'}
          </Button>
        </Box>

        {answer && (
          <Box mt={2}>
            <Typography variant="body1" style={{ whiteSpace: 'pre-wrap' }}>
              {answer}
            </Typography>

            {sources.length > 0 && (
              <Box mt={1}>
                <Typography variant="caption" color="textSecondary">
                  Sources:
                </Typography>
                <Box display="flex" gap={0.5} flexWrap="wrap" mt={0.5}>
                  {sources.map((s, i) => (
                    <Chip
                      key={i}
                      label={s.docPath}
                      size="small"
                      variant="outlined"
                    />
                  ))}
                </Box>
              </Box>
            )}
          </Box>
        )}
      </CardContent>
    </Card>
  );
};

En la app de Backstage, anyade el widget a la pagina de entidad:

// packages/app/src/components/catalog/EntityPage.tsx
import { AskWidget } from '@internal/plugin-techdocs-rag';

// Inside the service entity page layout:
<Grid item md={6}>
  <AskWidget />
</Grid>

Ahora cuando un desarrollador abre cualquier servicio en el catalogo, ve una tarjeta de “Ask about this service”. Escribe una pregunta y la IA responde usando la documentacion del servicio.

Registrar el Plugin

En packages/backend/src/index.ts:

import { techDocsRagPlugin } from '@internal/plugin-techdocs-rag';

backend.add(techDocsRagPlugin);

El plugin lee forge.aiServiceUrl de app-config.yaml (la misma configuracion que los otros plugins). Los endpoints de RAG (/api/index-doc, /api/ask) devuelven 503 cuando PostgreSQL no esta configurado, asi que el indexer funciona sin problemas en desarrollo sin pgvector — simplemente registra que la indexacion se salto.

Conectar RAG al Code Review

La mejor parte: el sistema RAG puede alimentar el code review del articulo 4. Antes de revisar un PR, el plugin puede buscar documentacion relevante e incluirla en el contexto de la revision:

// In review.ts, before calling the AI service:

// Search for docs related to the changed files
const changedAreas = codeFiles
  .map(f => f.filename.split('/')[0])
  .filter((v, i, a) => a.indexOf(v) === i)
  .join(' ');

const docsRes = await fetch(`${aiServiceUrl}/api/ask`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    question: `Guidelines and rules for ${changedAreas} in ${serviceName}`,
    entityRef: `component:default/${serviceName}`,
  }),
});

const docsContext = await docsRes.json();
// Add docsContext.answer to the review prompt

Ahora el code review conoce la retry policy del servicio, las convenciones de nombres y las reglas de despliegue — no porque alguien las escribiera en GOTCHA.md, sino porque estan en los TechDocs y el sistema RAG las encontro.

Configuracion del Proxy

Para que el frontend de Backstage se comunique con el servicio de IA, anyade un proxy en app-config.yaml:

proxy:
  endpoints:
    /ai-service:
      target: http://localhost:5100
      allowedHeaders: ['Content-Type']

Checklist

Extension pgvector habilitada en PostgreSQL
Tabla doc_chunks creada con indice vectorial
Endpoint /api/index-doc divide y genera embeddings de documentos
Endpoint /api/ask realiza busqueda vectorial y genera respuestas
Plugin TechDocs indexer ejecutandose de forma programada (cada 6 horas)
GOTCHA.md incluido en los documentos indexados
Componente AskWidget renderizando en la entity page
Proxy configurado para comunicacion frontend-a-servicio-de-IA
Las respuestas de la IA incluyen citas de las fuentes

Antes del Siguiente Articulo

Los desarrolladores ahora pueden hacer preguntas y obtener respuestas de la documentacion de la plataforma. El plugin de code review puede buscar en los documentos antes de revisar un PR. La IA ha pasado de “sabe lo que dice el catalogo” a “sabe lo que dice la documentacion.”

Pero quien controla lo que la IA puede y no puede hacer? Quien decide que servicios pueden usar el scaffolder? Quien rastrea si los hallazgos del AI review son utiles o ruido? Quien monitoriza los costes?

La plataforma necesita gobernanza. No gobernanza burocratica — gobernanza visible y automatizada. Un dashboard donde el equipo de plataforma vea que funcionalidades de IA se estan usando, por quien, cuanto cuestan y si estan ayudando.

Eso es el articulo 6: El AI Governance Dashboard.

Troubleshooting

TechDocs falla con “Docker does not appear to be available”

TechDocs usa Docker para construir la documentacion con mkdocs. Si usas OrbStack en vez de Docker Desktop, puede que Backstage no encuentre el socket. Arreglalo configurando DOCKER_HOST:

export DOCKER_HOST=unix:///var/run/docker.sock

O configura TechDocs para usar una instalacion local de mkdocs en vez de Docker. En app-config.yaml:

techdocs:
  generator:
    runIn: local

Despues instala mkdocs en local: pip install mkdocs mkdocs-techdocs-core

Vector dimension mismatch

Si te sale expected N dimensions, not M, el modelo de embedding devuelve un tamanyo de vector diferente al de tu esquema de tabla. Comprueba las dimensiones de salida de tu modelo y actualiza la columna vector(N) de acuerdo. Por ejemplo, bge-multilingual-gemma2 en Scaleway devuelve 3584 dimensiones, no 768.

better-sqlite3 falla al compilar en Node 24+

La base de datos por defecto de Backstage es better-sqlite3, que requiere compilacion nativa. Si falla, cambia a PostgreSQL en app-config.yaml:

backend:
  database:
    client: pg
    connection:
      host: localhost
      port: 5432
      user: postgres
      password: your-password

Si esta serie te resulta util, considera invitarme a un cafe.

Este es el articulo 5 de la serie AI-Native IDP. Siguiente: El AI Governance Dashboard — visibilidad y control para las funcionalidades de IA en tu plataforma.