Teaching Your Catalog to Think

The Problem

Open your Software Catalog. Pick a service. Read the description.

Is it accurate? Does it describe what the service actually does today — not what it did when someone wrote that description six months ago? Is the owner field correct, or did that team reorganize in January?

Now check the tags. The service was tagged dotnet and postgresql when it was created. Since then, someone added a Redis cache and a Service Bus integration. The tags don’t mention either.

This is the normal state of a service catalog. Not wrong because of negligence — wrong because keeping it right is a manual job that nobody has time for. Every time a developer adds a dependency, changes a database, or refactors an endpoint, someone needs to update the catalog. Nobody does.

The result: developers stop trusting the catalog. They go to Slack and ask “who owns the invoice service?” instead of looking it up. The platform team spends time answering questions that the catalog should answer. The IDP becomes a directory nobody reads.

In article 1 we set up Backstage with a static catalog. Now we make it smart.

The Solution

A Backstage plugin that runs on a schedule, reads the source code of every registered service, and updates the catalog metadata automatically. It uses an AI model via the OpenAI-compatible API to understand what the code does and generates accurate descriptions, tags, and dependency lists.

The flow:

Plugin reads the catalog-info.yaml for each registered component
Fetches the repository from GitHub (using the github.com/project-slug annotation)
Reads key files: Program.cs, package.json, Dockerfile, *.csproj, config files
Sends the code context to the AI model with a structured prompt
Gets back: updated description, tags, detected dependencies, detected APIs
Compares with current catalog metadata
If changes are detected: creates a PR in the service repo updating catalog-info.yaml

The developer reviews the PR and merges — or adjusts and merges. The human is always in the loop. The AI proposes, the developer decides.

Execute

We need two things: a .NET AI service that does the code analysis, and a Backstage backend plugin that triggers it.

The AI Service

This is the .NET service that reads code and produces catalog metadata. It runs as a separate container alongside Backstage.

// ai-service/CatalogEnricher/Program.cs
using System.ClientModel;
using System.Text.Json;
using System.Text.Json.Serialization;
using Azure.AI.OpenAI;
using OpenAI;
using OpenAI.Chat;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddCors(options =>
{
    options.AddPolicy("DevCors", policy =>
        policy.WithOrigins("http://localhost:3000", "http://localhost:7007")
              .AllowAnyHeader()
              .AllowAnyMethod());
});

var app = builder.Build();
app.UseCors("DevCors");

app.MapGet("/healthz", () => Results.Ok(new { status = "healthy" }));

app.MapPost("/api/enrich", async (EnrichRequest request, IConfiguration config) =>
{
    if (request.Files is not { Count: > 0 })
        return Results.BadRequest(new { error = "At least one file is required." });

    var endpoint = config["AI:Endpoint"];
    var apiKey = config["AI:Key"];
    var model = config["AI:ChatModel"] ?? "mistral-small-3.2-24b-instruct-2506";
    var provider = config["AI:Provider"] ?? "openai";

    ChatClient chatClient = provider.ToLowerInvariant() switch
    {
        "azure" => new AzureOpenAIClient(
            new Uri(endpoint!), new ApiKeyCredential(apiKey!))
            .GetChatClient(model),
        _ => new OpenAIClient(
            new ApiKeyCredential(apiKey!),
            new OpenAIClientOptions { Endpoint = new Uri(endpoint!) })
            .GetChatClient(model),
    };

    const int maxChars = 3000;
    var filesSummary = string.Join("\n\n", request.Files.Select(f =>
    {
        var content = f.Content.Length > maxChars
            ? f.Content[..maxChars] + "\n[...truncated]"
            : f.Content;
        return $"### {f.Path}\n```\n{content}\n```";
    }));

    var systemPrompt = """
        You are a code analysis assistant for a Backstage software catalog.
        Analyze the provided source files and return a JSON object with:
        - "description": a one-sentence summary of what this component does
        - "tags": an array of relevant technology tags
        - "dependencies": an array of external services this code depends on
        - "apiEndpoints": an array of API routes exposed by this code

        Rules:
        - Do not guess. Only report what is confirmed in the code.
        - Do not invent dependencies that are not explicitly imported.
        - Return ONLY valid JSON, no markdown fences, no extra text.
        """;

    try
    {
        var completion = await chatClient.CompleteChatAsync(
        [
            new SystemChatMessage(systemPrompt),
            new UserChatMessage($"Analyze these source files:\n\n{filesSummary}"),
        ]);

        var raw = completion.Value.Content[0].Text.Trim();
        var json = raw.StartsWith("```") ? raw.Split('\n', 2)[1].TrimEnd('`').Trim() : raw;

        var metadata = JsonSerializer.Deserialize<CatalogMetadata>(json, SerializerOptions.Default);
        return metadata is null
            ? Results.UnprocessableEntity(new { error = "AI returned invalid metadata." })
            : Results.Ok(metadata);
    }
    catch (ClientResultException ex) when (ex.Status == 401)
    {
        return Results.Json(new { error = "AI provider authentication failed." }, statusCode: 503);
    }
    catch (Exception ex)
    {
        return Results.Json(new { error = $"AI provider error: {ex.Message}" }, statusCode: 502);
    }
});

app.Run();

record EnrichRequest(List<SourceFile> Files);
record SourceFile(string Path, string Content);
record CatalogMetadata(
    [property: JsonPropertyName("description")] string Description,
    [property: JsonPropertyName("tags")] List<string> Tags,
    [property: JsonPropertyName("dependencies")] List<string> Dependencies,
    [property: JsonPropertyName("apiEndpoints")] List<string> ApiEndpoints);

static class SerializerOptions
{
    public static readonly JsonSerializerOptions Default = new()
    {
        PropertyNameCaseInsensitive = true,
    };
}

The system prompt is specific: “Do not guess.” This is important. A catalog with wrong information is worse than a catalog with no information. The AI should only report what it can confirm from the code.

The ChatModel defaults to Mistral Small on Scaleway Generative APIs, but this code works with any OpenAI-compatible provider. Change AI:Endpoint to point to Azure AI Foundry, OpenAI, Mistral AI, or any other provider — the same code handles all of them. Set AI:Provider to azure if you use Azure AI Foundry; for everything else, leave it as openai.

The Backstage Backend Plugin

The plugin runs inside Backstage’s backend. It reads catalog entities, fetches their source code from GitHub, sends it to the AI service, and creates PRs with the updated catalog-info.yaml.

// plugins/catalog-enricher-backend/src/plugin.ts
import {
  coreServices,
  createBackendModule,
} from '@backstage/backend-plugin-api';
import { catalogServiceRef } from '@backstage/plugin-catalog-node';
import { Octokit } from '@octokit/rest';
import { enrichEntity } from './enrich';

export const catalogEnricherModule = createBackendModule({
  pluginId: 'catalog',
  moduleId: 'enricher',
  register(env) {
    env.registerInit({
      deps: {
        logger: coreServices.logger,
        config: coreServices.rootConfig,
        scheduler: coreServices.scheduler,
        catalog: catalogServiceRef,
        auth: coreServices.auth,
      },
      async init({ logger, config, scheduler, catalog, auth }) {
        const githubToken = config.getString('catalogEnricher.githubToken');
        const aiServiceUrl = config.getString('catalogEnricher.aiServiceUrl');
        const octokit = new Octokit({ auth: githubToken });

        await scheduler.scheduleTask({
          id: 'catalog-enricher-run',
          frequency: { minutes: 1440 }, // 24 hours
          timeout: { minutes: 30 },
          initialDelay: { seconds: 30 },
          fn: async () => {
            logger.info('Starting catalog enrichment run');
            const { token } = await auth.getPluginRequestToken({
              onBehalfOf: await auth.getOwnServiceCredentials(),
              targetPluginId: 'catalog',
            });
            const { items: entities } = await catalog.getEntities(
              { filter: { kind: 'Component' } },
              { credentials: await auth.authenticate(token) },
            );

            let enriched = 0, skipped = 0, failed = 0;

            for (const entity of entities) {
              try {
                const changed = await enrichEntity({
                  entity, octokit, aiServiceUrl, logger,
                });
                changed ? enriched++ : skipped++;
              } catch (error) {
                failed++;
                logger.error(
                  `Failed to enrich ${entity.metadata.name}: ${error}`
                );
              }
            }

            logger.info(
              `Enrichment complete: ${enriched} enriched, ${skipped} skipped, ${failed} failed`
            );
          },
        });
      },
    });
  },
});

We use createBackendModule instead of createBackendPlugin because the enricher extends the catalog — it’s a module of the catalog plugin, not a separate plugin. This matters for service-to-service authentication: the module gets catalog credentials automatically through the auth service.

The enrichEntity function fetches code from GitHub and calls the AI service:

// plugins/catalog-enricher-backend/src/enrich.ts
import { Entity } from '@backstage/catalog-model';
import { Octokit } from '@octokit/rest';

interface EnrichmentResult {
  description: string;
  tags: string[];
  dependencies: string[];
  apiEndpoints: string[];
}

interface EnrichOptions {
  entity: Entity;
  octokit: Octokit;
  aiServiceUrl: string;
  logger: { info: (msg: string) => void; warn: (msg: string) => void };
}

export async function enrichEntity({
  entity, octokit, aiServiceUrl, logger,
}: EnrichOptions): Promise<boolean> {
  const slug = entity.metadata.annotations?.[
    'github.com/project-slug'
  ];
  if (!slug) {
    logger.warn(`${entity.metadata.name}: no github.com/project-slug, skipping`);
    return false;
  }

  const [owner, repo] = slug.split('/');

  const { data: tree } = await octokit.git.getTree({
    owner, repo, tree_sha: 'main', recursive: 'true',
  });

  const targetFiles = tree.tree.filter(
    f =>
      f.type === 'blob' &&
      f.path &&
      (f.path === 'Program.cs' ||
        f.path === 'package.json' ||
        f.path.endsWith('.csproj') ||
        f.path === 'Dockerfile' ||
        f.path === 'appsettings.json' ||
        f.path === 'app-config.yaml'),
  );

  const files: Array<{ path: string; content: string }> = [];

  for (const file of targetFiles) {
    if (!file.path) continue;
    try {
      const { data: content } = await octokit.repos.getContent({
        owner, repo, path: file.path,
        mediaType: { format: 'raw' },
      });
      files.push({ path: file.path, content: content as unknown as string });
    } catch {
      // File not readable — skip
    }
  }

  if (files.length === 0) {
    logger.info(`${entity.metadata.name}: no readable files, skipping`);
    return false;
  }

  // Call AI service — files must be an array of {path, content}
  const res = await fetch(`${aiServiceUrl}/api/enrich`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ files }),
  });

  if (!res.ok) {
    throw new Error(`AI service returned ${res.status}: ${await res.text()}`);
  }

  const enrichment: EnrichmentResult = await res.json() as EnrichmentResult;

  // Compare with current metadata
  const currentDesc = entity.metadata.description ?? '';
  const currentTags = (entity.metadata.tags ?? []) as string[];

  const descChanged = enrichment.description !== currentDesc;
  const tagsChanged =
    JSON.stringify(enrichment.tags.sort()) !==
    JSON.stringify([...currentTags].sort());

  if (!descChanged && !tagsChanged) {
    logger.info(`${entity.metadata.name}: no changes detected`);
    return false;
  }

  logger.info(
    `${entity.metadata.name}: changes detected — description: ${descChanged}, tags: ${tagsChanged}`
  );
  logger.info(
    `Proposed: description="${enrichment.description}", tags=[${enrichment.tags.join(', ')}]`
  );

  return true;
}

Registering the Module

In packages/backend/src/index.ts:

import { catalogEnricherModule } from '@internal/plugin-catalog-enricher-backend';

backend.add(catalogEnricherModule);

And in app-config.yaml:

catalogEnricher:
  aiServiceUrl: http://localhost:5100
  githubToken: ${GITHUB_TOKEN}

Running the AI Service

Set the AI provider credentials as environment variables. The AI__ prefix maps to the AI: section in appsettings.json:

export AI__Endpoint=https://api.scaleway.ai/v1           # or https://api.openai.com/v1, Azure endpoint, etc.
export AI__Key=your_api_key
export AI__ChatModel=mistral-small-3.2-24b-instruct-2506  # or gpt-4o, claude-sonnet-4-6, etc.

Then start the service:

cd ai-service/CatalogEnricher
dotnet run

It starts on port 5100 by default. The Backstage plugin calls it on the schedule (every 24 hours) or you can trigger it manually. You can test it directly:

curl http://localhost:5100/healthz

What the Output Looks Like

For the ScraperAgent service, the AI service returns:

{
  "description": "AI-powered market intelligence platform that scrapes Twitter/X accounts and delivers sentiment analysis reports by email",
  "tags": ["dotnet", "nextjs", "postgresql", "azure-openai", "kubernetes", "mailkit"],
  "dependencies": ["Azure OpenAI", "PostgreSQL", "Scaleway SMTP", "Twitter/X API", "Mollie Payments"],
  "apiEndpoints": [
    "POST /api/{domain}/analyze",
    "GET /api/{domain}/reports",
    "POST /api/subscribe",
    "POST /api/webhook/mollie"
  ]
}

Compare that to a hand-written catalog entry that says “AI-powered market and crypto intelligence platform.” The AI version is more specific, lists the actual endpoints, and detected dependencies that a human might forget to mention.

Creating PRs (The Full Loop)

The code above logs proposed changes. The full implementation creates a PR. Here’s the PR creation logic using the Octokit library:

import { Octokit } from '@octokit/rest';
import * as yaml from 'yaml';

async function createEnrichmentPR(
  slug: string,
  currentYaml: string,
  enrichment: EnrichmentResult,
): Promise<string> {
  const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });
  const [owner, repo] = slug.split('/');

  // Parse current catalog-info.yaml
  const doc = yaml.parseDocument(currentYaml);

  // Update fields
  doc.setIn(['metadata', 'description'], enrichment.description);
  doc.setIn(['metadata', 'tags'], enrichment.tags);

  const updatedYaml = doc.toString();
  const branchName = `catalog-enrichment-${Date.now()}`;

  // Get default branch SHA
  const { data: ref } = await octokit.git.getRef({
    owner, repo, ref: 'heads/main',
  });

  // Create branch
  await octokit.git.createRef({
    owner, repo,
    ref: `refs/heads/${branchName}`,
    sha: ref.object.sha,
  });

  // Get current file SHA
  const { data: file } = await octokit.repos.getContent({
    owner, repo, path: 'catalog-info.yaml', ref: 'main',
  });

  // Update file on new branch
  await octokit.repos.createOrUpdateFileContents({
    owner, repo,
    path: 'catalog-info.yaml',
    message: 'chore: update catalog metadata (AI enrichment)',
    content: Buffer.from(updatedYaml).toString('base64'),
    branch: branchName,
    sha: (file as { sha: string }).sha,
  });

  // Create PR
  const { data: pr } = await octokit.pulls.create({
    owner, repo,
    title: 'Update catalog metadata (AI enrichment)',
    body: [
      '## Auto-generated catalog update',
      '',
      `**Description:** ${enrichment.description}`,
      `**Tags:** ${enrichment.tags.join(', ')}`,
      `**Dependencies:** ${enrichment.dependencies.join(', ')}`,
      '',
      'Generated by the Forge catalog enricher plugin.',
      'Review and merge if the changes look correct.',
    ].join('\n'),
    head: branchName,
    base: 'main',
  });

  return pr.html_url;
}

The PR body shows exactly what changed and why. The developer reviews it like any other PR. No magic, no black box.

Checklist

AI service running and responding to /api/enrich
Backstage backend plugin registered and scheduled
GitHub token configured with read access to service repos
Plugin logs enrichment results for each catalog entity
PR creation works for at least one test service
System prompt includes “Do not guess” to avoid hallucinated metadata

Before the Next Article

The catalog now maintains itself. Services get accurate descriptions and tags without anyone manually editing YAML files.

But the catalog is only one part of the platform. The other big pain point is creating new services. Your static template generates the same skeleton every time. What if a developer could describe what they need — “a .NET API that connects to PostgreSQL and publishes events to Service Bus” — and get a project scaffolded with the right dependencies, configuration, and even a GOTCHA prompt ready for AI-assisted development?

That’s article 3: AI-powered Software Templates.

If this series helps you, consider buying me a coffee.

This is article 2 of the AI-Native IDP series. Next: AI-Powered Software Templates — a scaffolder that understands natural language.