Enseñando a tu Catalog a pensar

El Problema

Abre tu Software Catalog. Elige un servicio. Lee la descripcion.

Es correcta? Describe lo que el servicio realmente hace hoy — no lo que hacia cuando alguien escribio esa descripcion hace seis meses? El campo de owner es correcto, o ese equipo se reorganizo en enero?

Ahora revisa los tags. El servicio fue etiquetado como dotnet y postgresql cuando se creo. Desde entonces, alguien ha añadido un cache con Redis y una integracion con Service Bus. Los tags no mencionan ninguno de los dos.

Este es el estado normal de un service catalog. No esta mal por negligencia — esta mal porque mantenerlo correcto es un trabajo manual que nadie tiene tiempo de hacer. Cada vez que un desarrollador añade una dependencia, cambia una base de datos o refactoriza un endpoint, alguien tiene que actualizar el catalogo. Nadie lo hace.

El resultado: los desarrolladores dejan de confiar en el catalogo. Van a Slack y preguntan “quien es el owner del invoice service?” en vez de buscarlo. El equipo de plataforma pierde tiempo respondiendo preguntas que el catalogo deberia responder. El IDP se convierte en un directorio que nadie lee.

En el articulo 1 configuramos Backstage con un catalogo estatico. Ahora lo hacemos inteligente.

La Solucion

Un plugin de Backstage que se ejecuta en un schedule, lee el codigo fuente de cada servicio registrado y actualiza los metadata del catalogo automaticamente. Usa un modelo de AI a traves de la API compatible con OpenAI para entender que hace el codigo y genera descripciones, tags y listas de dependencias precisas.

El flujo:

El plugin lee el catalog-info.yaml de cada componente registrado
Descarga el repositorio desde GitHub (usando la annotation github.com/project-slug)
Lee archivos clave: Program.cs, package.json, Dockerfile, *.csproj, archivos de configuracion
Envia el contexto del codigo al modelo de AI con un prompt estructurado
Recibe de vuelta: descripcion actualizada, tags, dependencias detectadas, APIs detectadas
Compara con los metadata actuales del catalogo
Si se detectan cambios: crea un PR en el repo del servicio actualizando catalog-info.yaml

El desarrollador revisa el PR y lo mergea — o lo ajusta y lo mergea. El humano siempre esta en el loop. La AI propone, el desarrollador decide.

Ejecucion

Necesitamos dos cosas: un servicio .NET de AI que haga el analisis del codigo, y un backend plugin de Backstage que lo dispare.

El Servicio de AI

Este es el servicio .NET que lee codigo y produce metadata para el catalogo. Se ejecuta como un container separado junto a Backstage.

// ai-service/CatalogEnricher/Program.cs
using System.ClientModel;
using System.Text.Json;
using System.Text.Json.Serialization;
using Azure.AI.OpenAI;
using OpenAI;
using OpenAI.Chat;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddCors(options =>
{
    options.AddPolicy("DevCors", policy =>
        policy.WithOrigins("http://localhost:3000", "http://localhost:7007")
              .AllowAnyHeader()
              .AllowAnyMethod());
});

var app = builder.Build();
app.UseCors("DevCors");

app.MapGet("/healthz", () => Results.Ok(new { status = "healthy" }));

app.MapPost("/api/enrich", async (EnrichRequest request, IConfiguration config) =>
{
    if (request.Files is not { Count: > 0 })
        return Results.BadRequest(new { error = "At least one file is required." });

    var endpoint = config["AI:Endpoint"];
    var apiKey = config["AI:Key"];
    var model = config["AI:ChatModel"] ?? "mistral-small-3.2-24b-instruct-2506";
    var provider = config["AI:Provider"] ?? "openai";

    ChatClient chatClient = provider.ToLowerInvariant() switch
    {
        "azure" => new AzureOpenAIClient(
            new Uri(endpoint!), new ApiKeyCredential(apiKey!))
            .GetChatClient(model),
        _ => new OpenAIClient(
            new ApiKeyCredential(apiKey!),
            new OpenAIClientOptions { Endpoint = new Uri(endpoint!) })
            .GetChatClient(model),
    };

    const int maxChars = 3000;
    var filesSummary = string.Join("\n\n", request.Files.Select(f =>
    {
        var content = f.Content.Length > maxChars
            ? f.Content[..maxChars] + "\n[...truncated]"
            : f.Content;
        return $"### {f.Path}\n```\n{content}\n```";
    }));

    var systemPrompt = """
        You are a code analysis assistant for a Backstage software catalog.
        Analyze the provided source files and return a JSON object with:
        - "description": a one-sentence summary of what this component does
        - "tags": an array of relevant technology tags
        - "dependencies": an array of external services this code depends on
        - "apiEndpoints": an array of API routes exposed by this code

        Rules:
        - Do not guess. Only report what is confirmed in the code.
        - Do not invent dependencies that are not explicitly imported.
        - Return ONLY valid JSON, no markdown fences, no extra text.
        """;

    try
    {
        var completion = await chatClient.CompleteChatAsync(
        [
            new SystemChatMessage(systemPrompt),
            new UserChatMessage($"Analyze these source files:\n\n{filesSummary}"),
        ]);

        var raw = completion.Value.Content[0].Text.Trim();
        var json = raw.StartsWith("```") ? raw.Split('\n', 2)[1].TrimEnd('`').Trim() : raw;

        var metadata = JsonSerializer.Deserialize<CatalogMetadata>(json, SerializerOptions.Default);
        return metadata is null
            ? Results.UnprocessableEntity(new { error = "AI returned invalid metadata." })
            : Results.Ok(metadata);
    }
    catch (ClientResultException ex) when (ex.Status == 401)
    {
        return Results.Json(new { error = "AI provider authentication failed." }, statusCode: 503);
    }
    catch (Exception ex)
    {
        return Results.Json(new { error = $"AI provider error: {ex.Message}" }, statusCode: 502);
    }
});

app.Run();

record EnrichRequest(List<SourceFile> Files);
record SourceFile(string Path, string Content);
record CatalogMetadata(
    [property: JsonPropertyName("description")] string Description,
    [property: JsonPropertyName("tags")] List<string> Tags,
    [property: JsonPropertyName("dependencies")] List<string> Dependencies,
    [property: JsonPropertyName("apiEndpoints")] List<string> ApiEndpoints);

static class SerializerOptions
{
    public static readonly JsonSerializerOptions Default = new()
    {
        PropertyNameCaseInsensitive = true,
    };
}

El system prompt es especifico: “Do not guess.” Esto es importante. Un catalogo con informacion incorrecta es peor que un catalogo sin informacion. La AI solo debe reportar lo que puede confirmar del codigo.

El ChatModel por defecto es Mistral Small en Scaleway Generative APIs, pero este codigo funciona con cualquier provider compatible con OpenAI. Cambia AI:Endpoint para apuntar a Azure AI Foundry, OpenAI, Mistral AI, o cualquier otro provider — el mismo codigo los maneja a todos. Configura AI:Provider como azure si usas Azure AI Foundry; para todo lo demas, dejalo como openai.

El Backend Plugin de Backstage

El plugin se ejecuta dentro del backend de Backstage. Lee las entities del catalogo, descarga su codigo fuente de GitHub, lo envia al servicio de AI y crea PRs con el catalog-info.yaml actualizado.

// plugins/catalog-enricher-backend/src/plugin.ts
import {
  coreServices,
  createBackendModule,
} from '@backstage/backend-plugin-api';
import { catalogServiceRef } from '@backstage/plugin-catalog-node';
import { Octokit } from '@octokit/rest';
import { enrichEntity } from './enrich';

export const catalogEnricherModule = createBackendModule({
  pluginId: 'catalog',
  moduleId: 'enricher',
  register(env) {
    env.registerInit({
      deps: {
        logger: coreServices.logger,
        config: coreServices.rootConfig,
        scheduler: coreServices.scheduler,
        catalog: catalogServiceRef,
        auth: coreServices.auth,
      },
      async init({ logger, config, scheduler, catalog, auth }) {
        const githubToken = config.getString('catalogEnricher.githubToken');
        const aiServiceUrl = config.getString('catalogEnricher.aiServiceUrl');
        const octokit = new Octokit({ auth: githubToken });

        await scheduler.scheduleTask({
          id: 'catalog-enricher-run',
          frequency: { minutes: 1440 }, // 24 hours
          timeout: { minutes: 30 },
          initialDelay: { seconds: 30 },
          fn: async () => {
            logger.info('Starting catalog enrichment run');
            const { token } = await auth.getPluginRequestToken({
              onBehalfOf: await auth.getOwnServiceCredentials(),
              targetPluginId: 'catalog',
            });
            const { items: entities } = await catalog.getEntities(
              { filter: { kind: 'Component' } },
              { credentials: await auth.authenticate(token) },
            );

            let enriched = 0, skipped = 0, failed = 0;

            for (const entity of entities) {
              try {
                const changed = await enrichEntity({
                  entity, octokit, aiServiceUrl, logger,
                });
                changed ? enriched++ : skipped++;
              } catch (error) {
                failed++;
                logger.error(
                  `Failed to enrich ${entity.metadata.name}: ${error}`
                );
              }
            }

            logger.info(
              `Enrichment complete: ${enriched} enriched, ${skipped} skipped, ${failed} failed`
            );
          },
        });
      },
    });
  },
});

Usamos createBackendModule en lugar de createBackendPlugin porque el enricher extiende el catalogo — es un module del plugin catalog, no un plugin separado. Esto importa para la autenticacion service-to-service: el module obtiene las credenciales del catalogo automaticamente a traves del servicio auth.

La funcion enrichEntity descarga codigo de GitHub y llama al servicio de AI:

// plugins/catalog-enricher-backend/src/enrich.ts
import { Entity } from '@backstage/catalog-model';
import { Octokit } from '@octokit/rest';

interface EnrichmentResult {
  description: string;
  tags: string[];
  dependencies: string[];
  apiEndpoints: string[];
}

interface EnrichOptions {
  entity: Entity;
  octokit: Octokit;
  aiServiceUrl: string;
  logger: { info: (msg: string) => void; warn: (msg: string) => void };
}

export async function enrichEntity({
  entity, octokit, aiServiceUrl, logger,
}: EnrichOptions): Promise<boolean> {
  const slug = entity.metadata.annotations?.[
    'github.com/project-slug'
  ];
  if (!slug) {
    logger.warn(`${entity.metadata.name}: no github.com/project-slug, skipping`);
    return false;
  }

  const [owner, repo] = slug.split('/');

  const { data: tree } = await octokit.git.getTree({
    owner, repo, tree_sha: 'main', recursive: 'true',
  });

  const targetFiles = tree.tree.filter(
    f =>
      f.type === 'blob' &&
      f.path &&
      (f.path === 'Program.cs' ||
        f.path === 'package.json' ||
        f.path.endsWith('.csproj') ||
        f.path === 'Dockerfile' ||
        f.path === 'appsettings.json' ||
        f.path === 'app-config.yaml'),
  );

  const files: Array<{ path: string; content: string }> = [];

  for (const file of targetFiles) {
    if (!file.path) continue;
    try {
      const { data: content } = await octokit.repos.getContent({
        owner, repo, path: file.path,
        mediaType: { format: 'raw' },
      });
      files.push({ path: file.path, content: content as unknown as string });
    } catch {
      // File not readable — skip
    }
  }

  if (files.length === 0) {
    logger.info(`${entity.metadata.name}: no readable files, skipping`);
    return false;
  }

  // Call AI service — files must be an array of {path, content}
  const res = await fetch(`${aiServiceUrl}/api/enrich`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ files }),
  });

  if (!res.ok) {
    throw new Error(`AI service returned ${res.status}: ${await res.text()}`);
  }

  const enrichment: EnrichmentResult = await res.json() as EnrichmentResult;

  // Compare with current metadata
  const currentDesc = entity.metadata.description ?? '';
  const currentTags = (entity.metadata.tags ?? []) as string[];

  const descChanged = enrichment.description !== currentDesc;
  const tagsChanged =
    JSON.stringify(enrichment.tags.sort()) !==
    JSON.stringify([...currentTags].sort());

  if (!descChanged && !tagsChanged) {
    logger.info(`${entity.metadata.name}: no changes detected`);
    return false;
  }

  logger.info(
    `${entity.metadata.name}: changes detected — description: ${descChanged}, tags: ${tagsChanged}`
  );
  logger.info(
    `Proposed: description="${enrichment.description}", tags=[${enrichment.tags.join(', ')}]`
  );

  return true;
}

Registrando el Module

En packages/backend/src/index.ts:

import { catalogEnricherModule } from '@internal/plugin-catalog-enricher-backend';

backend.add(catalogEnricherModule);

Y en app-config.yaml:

catalogEnricher:
  aiServiceUrl: http://localhost:5100
  githubToken: ${GITHUB_TOKEN}

Ejecutando el Servicio de AI

Configura las credenciales del provider de AI como variables de entorno. El prefijo AI__ se mapea a la seccion AI: en appsettings.json:

export AI__Endpoint=https://api.scaleway.ai/v1           # or https://api.openai.com/v1, Azure endpoint, etc.
export AI__Key=your_api_key
export AI__ChatModel=mistral-small-3.2-24b-instruct-2506  # or gpt-4o, claude-sonnet-4-6, etc.

Despues inicia el servicio:

cd ai-service/CatalogEnricher
dotnet run

Arranca en el puerto 5100 por defecto. El plugin de Backstage lo llama en el schedule (cada 24 horas) o puedes dispararlo manualmente. Puedes probarlo directamente:

curl http://localhost:5100/healthz

Como se ve el Output

Para el servicio ScraperAgent, el servicio de AI devuelve:

{
  "description": "AI-powered market intelligence platform that scrapes Twitter/X accounts and delivers sentiment analysis reports by email",
  "tags": ["dotnet", "nextjs", "postgresql", "azure-openai", "kubernetes", "mailkit"],
  "dependencies": ["Azure OpenAI", "PostgreSQL", "Scaleway SMTP", "Twitter/X API", "Mollie Payments"],
  "apiEndpoints": [
    "POST /api/{domain}/analyze",
    "GET /api/{domain}/reports",
    "POST /api/subscribe",
    "POST /api/webhook/mollie"
  ]
}

Compara eso con una entrada manual del catalogo que dice “AI-powered market and crypto intelligence platform.” La version de la AI es mas especifica, lista los endpoints reales y detecto dependencias que un humano podria olvidar mencionar.

Creando PRs (El Loop Completo)

El codigo de arriba registra los cambios propuestos en los logs. La implementacion completa crea un PR. Aqui esta la logica de creacion de PR usando la libreria Octokit:

import { Octokit } from '@octokit/rest';
import * as yaml from 'yaml';

async function createEnrichmentPR(
  slug: string,
  currentYaml: string,
  enrichment: EnrichmentResult,
): Promise<string> {
  const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });
  const [owner, repo] = slug.split('/');

  // Parse current catalog-info.yaml
  const doc = yaml.parseDocument(currentYaml);

  // Update fields
  doc.setIn(['metadata', 'description'], enrichment.description);
  doc.setIn(['metadata', 'tags'], enrichment.tags);

  const updatedYaml = doc.toString();
  const branchName = `catalog-enrichment-${Date.now()}`;

  // Get default branch SHA
  const { data: ref } = await octokit.git.getRef({
    owner, repo, ref: 'heads/main',
  });

  // Create branch
  await octokit.git.createRef({
    owner, repo,
    ref: `refs/heads/${branchName}`,
    sha: ref.object.sha,
  });

  // Get current file SHA
  const { data: file } = await octokit.repos.getContent({
    owner, repo, path: 'catalog-info.yaml', ref: 'main',
  });

  // Update file on new branch
  await octokit.repos.createOrUpdateFileContents({
    owner, repo,
    path: 'catalog-info.yaml',
    message: 'chore: update catalog metadata (AI enrichment)',
    content: Buffer.from(updatedYaml).toString('base64'),
    branch: branchName,
    sha: (file as { sha: string }).sha,
  });

  // Create PR
  const { data: pr } = await octokit.pulls.create({
    owner, repo,
    title: 'Update catalog metadata (AI enrichment)',
    body: [
      '## Auto-generated catalog update',
      '',
      `**Description:** ${enrichment.description}`,
      `**Tags:** ${enrichment.tags.join(', ')}`,
      `**Dependencies:** ${enrichment.dependencies.join(', ')}`,
      '',
      'Generated by the Forge catalog enricher plugin.',
      'Review and merge if the changes look correct.',
    ].join('\n'),
    head: branchName,
    base: 'main',
  });

  return pr.html_url;
}

El body del PR muestra exactamente que cambio y por que. El desarrollador lo revisa como cualquier otro PR. Sin magia, sin caja negra.

Checklist

Servicio de AI corriendo y respondiendo a /api/enrich
Backend plugin de Backstage registrado y programado
GitHub token configurado con acceso de lectura a los repos de servicios
El plugin registra en logs los resultados de enrichment para cada entity del catalogo
La creacion de PRs funciona para al menos un servicio de prueba
El system prompt incluye “Do not guess” para evitar metadata inventados

Antes del Proximo Articulo

El catalogo ahora se mantiene solo. Los servicios obtienen descripciones y tags precisos sin que nadie edite archivos YAML manualmente.

Pero el catalogo es solo una parte de la plataforma. El otro gran punto de dolor es crear nuevos servicios. Tu template estatico genera el mismo skeleton siempre. Y si un desarrollador pudiera describir lo que necesita — “una API .NET que se conecte a PostgreSQL y publique eventos a Service Bus” — y obtener un proyecto scaffolded con las dependencias correctas, la configuracion adecuada, e incluso un prompt GOTCHA listo para desarrollo asistido por AI?

Eso es el articulo 3: AI-powered Software Templates.

Si esta serie te ayuda, considera invitarme un cafe.

Este es el articulo 2 de la serie AI-Native IDP. Siguiente: AI-Powered Software Templates — un scaffolder que entiende lenguaje natural.