The AI Code Review Plugin - Victor Zaragoza

The Problem

A developer opens a PR. The reviewer looks at the diff. They see 400 lines of code. They check: does it compile? Does it follow the style guide? Are there obvious bugs?

But they don’t check: does this service use the repository pattern? Is it OK to call Service Bus synchronously here? Should this endpoint require authentication? They don’t check because they don’t know. They’re reviewing the code in isolation — without the context of what this service is, what rules it follows, or how it fits in the system.

This happens in every team. Reviewer A knows the invoice service well but gets assigned to review a PR for the notification service. They spend 20 minutes understanding the context before they can give useful feedback. And even then, they miss things. Architectural patterns that are obvious to the team who built the service are invisible to an outside reviewer.

In article 2 we taught the catalog to understand services. In article 3 we generated projects with GOTCHA prompts. Now we use both — the catalog metadata and the GOTCHA prompt — to make code reviews context-aware.

The Solution

A Backstage plugin that hooks into GitHub pull requests and runs an AI-powered review. But unlike generic AI review tools, this one reads the service catalog first.

The flow:

A webhook fires when a PR is opened or updated
The plugin looks up the repository in the Backstage catalog
It reads the service metadata: description, dependencies, tags, owner
It reads the GOTCHA.md file from the repo (generated in article 3)
It fetches the PR diff from GitHub
It sends everything to the AI model: the catalog context, the GOTCHA heuristics, and the diff
The AI reviews the code with full architectural context
The plugin posts the review as a GitHub PR comment

The reviewer still reviews. But now they have an AI assistant that already knows the service, the rules, and the patterns.

Execute

The AI Review Endpoint

We add a new endpoint to the .NET AI service:

app.MapPost("/api/review", async (ReviewRequest request, IConfiguration config) =>
{
    if (string.IsNullOrWhiteSpace(request.Diff))
        return Results.BadRequest(new { error = "Diff is required." });

    var endpoint = config["AI:Endpoint"];
    var apiKey = config["AI:Key"];
    var model = config["AI:ChatModel"] ?? "mistral-small-3.2-24b-instruct-2506";
    var provider = config["AI:Provider"] ?? "openai";

    ChatClient chatClient = provider.ToLowerInvariant() switch
    {
        "azure" => new AzureOpenAIClient(
            new Uri(endpoint!), new ApiKeyCredential(apiKey!))
            .GetChatClient(model),
        _ => new OpenAIClient(
            new ApiKeyCredential(apiKey!),
            new OpenAIClientOptions { Endpoint = new Uri(endpoint!) })
            .GetChatClient(model),
    };

    var systemPrompt = $"""
        You are a senior code reviewer for the {request.ServiceName} service.

        SERVICE CONTEXT (from the Software Catalog):
        Description: {request.ServiceDescription}
        Tags: {string.Join(", ", request.Tags)}
        Dependencies: {string.Join(", ", request.Dependencies)}

        ARCHITECTURAL RULES (from GOTCHA.md):
        {request.GotchaHeuristics}

        Review the following pull request diff. Focus on:
        1. Violations of the architectural rules listed above
        2. Security issues (authentication, input validation, secrets)
        3. Patterns that contradict the service's documented purpose
        4. Missing error handling for the specific dependencies this service uses

        Do NOT comment on:
        - Code style (formatting, naming conventions) — the linter handles that
        - Generic best practices that don't relate to this specific service

        Format your review as a list of findings. For each finding:
        - File and line reference
        - What the issue is
        - Why it matters for THIS service specifically
        - Suggested fix

        If the code looks good, say so. Don't invent problems.
        """;

    try
    {
        var completion = await chatClient.CompleteChatAsync(
        [
            new SystemChatMessage(systemPrompt),
            new UserChatMessage($"PR: {request.PrTitle}\n\nDiff:\n{request.Diff}"),
        ]);

        var review = completion.Value.Content[0].Text.Trim();
        return Results.Ok(new { review });
    }
    catch (ClientResultException ex) when (ex.Status == 401)
    {
        return Results.Json(new { error = "AI provider authentication failed." }, statusCode: 503);
    }
    catch (Exception ex)
    {
        return Results.Json(new { error = $"AI provider error: {ex.Message}" }, statusCode: 502);
    }
});

record ReviewRequest(
    string ServiceName,
    string ServiceDescription,
    string[] Tags,
    string[] Dependencies,
    string GotchaHeuristics,
    string PrTitle,
    string Diff);

The system prompt is what makes this different. It’s not “review this code.” It’s “review this code knowing that this service uses PostgreSQL, publishes to Service Bus, and must never call Service Bus synchronously in the request pipeline.”

The Backstage Backend Plugin

The plugin listens for GitHub webhooks and triggers the review:

// plugins/ai-code-review/src/module.ts
import {
  coreServices,
  createBackendPlugin,
} from '@backstage/backend-plugin-api';
import { catalogServiceRef } from '@backstage/plugin-catalog-node';
import { createRouter } from './router';

export const aiCodeReviewPlugin = createBackendPlugin({
  pluginId: 'ai-code-review',
  register(env) {
    env.registerInit({
      deps: {
        logger: coreServices.logger,
        httpRouter: coreServices.httpRouter,
        config: coreServices.rootConfig,
        catalog: catalogServiceRef,
        auth: coreServices.auth,
      },
      async init({ logger, httpRouter, config, catalog, auth }) {
        const aiServiceUrl = config.getString('forge.aiServiceUrl');

        const router = await createRouter({
          logger,
          catalog,
          auth,
          aiServiceUrl,
        });

        httpRouter.use(router);
        httpRouter.addAuthPolicy({
          path: '/webhook/github',
          allow: 'unauthenticated',
        });
        logger.info('AI Code Review plugin initialized');
      },
    });
  },
});

This is a createBackendPlugin — not a module — because code review has its own HTTP routes. Modules extend existing plugins; plugins get their own route namespace (/api/ai-code-review/). The addAuthPolicy call lets the webhook endpoint accept unauthenticated requests from GitHub.

The Webhook Router

The router handles GitHub webhook events:

// plugins/ai-code-review/src/router.ts
import { Router, json } from 'express';
import type { LoggerService, AuthService } from '@backstage/backend-plugin-api';
import type { CatalogService } from '@backstage/plugin-catalog-node';
import { reviewPullRequest } from './review';

interface RouterOptions {
  logger: LoggerService;
  catalog: CatalogService;
  auth: AuthService;
  aiServiceUrl: string;
}

export async function createRouter(options: RouterOptions): Promise<Router> {
  const { logger, catalog, auth, aiServiceUrl } = options;
  const router = Router();
  router.use(json());

  router.post('/webhook/github', async (req, res) => {
    const event = req.headers['x-github-event'];
    const payload = req.body;

    if (event !== 'pull_request') {
      res.status(200).json({ ignored: true });
      return;
    }

    const action = payload.action;
    if (action !== 'opened' && action !== 'synchronize') {
      res.status(200).json({ ignored: true });
      return;
    }

    const repoFullName = payload.repository.full_name;
    const prNumber = payload.pull_request.number;
    const prTitle = payload.pull_request.title;

    logger.info(
      `PR ${action}: ${repoFullName}#${prNumber} — ${prTitle}`,
    );

    // Look up the service in the catalog
    const credentials = await auth.getOwnServiceCredentials();
    const entities = await catalog.getEntities(
      {
        filter: {
          kind: 'Component',
          'metadata.annotations.github.com/project-slug': repoFullName,
        },
      },
      { credentials },
    );

    if (entities.items.length === 0) {
      logger.info(
        `No catalog entity for ${repoFullName}, skipping review`,
      );
      res.status(200).json({ skipped: 'not in catalog' });
      return;
    }

    const entity = entities.items[0];

    // Run the review in the background
    reviewPullRequest({
      entity,
      repoFullName,
      prNumber,
      prTitle,
      aiServiceUrl,
      logger,
    }).catch(err =>
      logger.error(`Review failed for ${repoFullName}#${prNumber}: ${err}`),
    );

    res.status(202).json({ accepted: true });
  });

  return router;
}

Two things changed from the naive version: we add json() middleware (the webhook route handles its own body parsing), and we use auth.getOwnServiceCredentials() to authenticate with the catalog. This is how standalone plugins talk to other plugins in the new Backstage backend system.

The key decision: when a PR comes in, the plugin looks up the repo in the catalog. If the repo isn’t registered in Backstage, we skip the review. This isn’t a generic review tool — it only reviews services that are part of the platform.

The Review Logic

This is where the catalog context and GOTCHA prompt come together:

// plugins/ai-code-review/src/review.ts
import { Entity } from '@backstage/catalog-model';
import { Octokit } from '@octokit/rest';

interface ReviewOptions {
  entity: Entity;
  repoFullName: string;
  prNumber: number;
  prTitle: string;
  aiServiceUrl: string;
  logger: { info: (msg: string) => void };
}

export async function reviewPullRequest(
  options: ReviewOptions,
): Promise<void> {
  const { entity, repoFullName, prNumber, prTitle, aiServiceUrl, logger } =
    options;
  const [owner, repo] = repoFullName.split('/');
  const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });

  // 1. Fetch PR diff
  const { data: diff } = await octokit.pulls.get({
    owner,
    repo,
    pull_number: prNumber,
    mediaType: { format: 'diff' },
  });

  const diffText = diff as unknown as string;

  // Limit diff size to avoid exceeding token limits
  const maxDiffLength = 15000;
  const truncatedDiff =
    diffText.length > maxDiffLength
      ? diffText.slice(0, maxDiffLength) + '\n[diff truncated]'
      : diffText;

  // 2. Read GOTCHA.md from the repo (if it exists)
  let gotchaHeuristics = 'No GOTCHA.md found in this repository.';
  try {
    const { data: gotchaFile } = await octokit.repos.getContent({
      owner,
      repo,
      path: 'GOTCHA.md',
      mediaType: { format: 'raw' },
    });
    const gotchaContent = gotchaFile as unknown as string;

    // Extract just the HEURISTICS section
    const heuristicsMatch = gotchaContent.match(
      /## HEURISTICS\s*\n([\s\S]*?)(?=\n## [A-Z]|\n---|\$)/,
    );
    if (heuristicsMatch) {
      gotchaHeuristics = heuristicsMatch[1].trim();
    }
  } catch {
    // No GOTCHA.md — use catalog metadata only
  }

  // 3. Build context from catalog entity
  const serviceName = entity.metadata.name;
  const serviceDescription = entity.metadata.description ?? 'No description';
  const tags = (entity.metadata.tags as string[]) ?? [];
  // Use tags as a proxy for dependencies — in production,
  // read from catalog relations (dependsOn)
  const dependencies = tags;

  logger.info(`Reviewing ${repoFullName}#${prNumber} with catalog context`);

  // 4. Call AI service
  const res = await fetch(`${aiServiceUrl}/api/review`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      serviceName,
      serviceDescription,
      tags,
      dependencies,
      gotchaHeuristics,
      prTitle,
      diff: truncatedDiff,
    }),
  });

  if (!res.ok) {
    throw new Error(`AI review service returned ${res.status}`);
  }

  const { review } = (await res.json()) as { review: string };

  // 5. Post review as PR comment
  await octokit.issues.createComment({
    owner,
    repo,
    issue_number: prNumber,
    body: formatReviewComment(review, serviceName),
  });

  logger.info(`Review posted for ${repoFullName}#${prNumber}`);
}

function formatReviewComment(
  review: string,
  serviceName: string,
): string {
  return [
    `## Forge Code Review — ${serviceName}`,
    '',
    '*Reviewed with context from the Software Catalog and GOTCHA.md*',
    '',
    review,
    '',
    '---',
    '*Generated by [Forge](https://github.com/victorZKov/forge) AI Code Review Plugin*',
  ].join('\n');
}

What a Review Looks Like

A developer opens a PR for the invoice-api. The PR adds a new endpoint that creates an invoice and sends a Service Bus message. The AI review comes back:

## Forge Code Review — invoice-api

*Reviewed with context from the Software Catalog and GOTCHA.md*

### Finding 1: Synchronous Service Bus call in request pipeline
**File:** `Endpoints/CreateInvoice.cs`, line 34
**Issue:** `await serviceBusClient.SendMessageAsync(message)` is called inside
the HTTP request handler before returning the response.
**Why it matters:** The GOTCHA heuristics for this service say
"No synchronous Service Bus sends in request pipeline." If Service Bus
is slow or unavailable, the HTTP request blocks.
**Fix:** Move the send to a background task or use the Outbox pattern —
write the event to the database in the same transaction as the invoice,
then process it asynchronously.

### Finding 2: Entity exposed directly in response
**File:** `Endpoints/CreateInvoice.cs`, line 42
**Issue:** The endpoint returns `Results.Ok(invoice)` where `invoice` is
the EF Core entity.
**Why it matters:** The GOTCHA heuristics say "Return DTOs, not entities."
Returning the entity exposes the database schema (including `DeletedAt`,
internal IDs) to the API consumer.
**Fix:** Create a `CreateInvoiceResponse` record with only the fields
the client needs.

### Overall
The endpoint logic is correct and the validation looks good.
The two findings above are architectural — fixing them aligns the code
with the patterns documented for this service.

---
*Generated by Forge AI Code Review Plugin*

This is not a generic “you should add error handling” review. It references the specific rules for this specific service. The reviewer who gets this PR now has architectural context without needing to read the GOTCHA.md themselves.

The GitHub Webhook

Set up the webhook in GitHub (repository settings or organization-wide):

Payload URL: https://your-backstage/api/ai-code-review/webhook/github
Content type: application/json
Secret: Use a webhook secret and validate it in the router (omitted here for clarity)
Events: Pull requests

Registering the Plugin

In packages/backend/src/index.ts:

import { aiCodeReviewPlugin } from '@internal/plugin-ai-code-review';

backend.add(aiCodeReviewPlugin);

The plugin reads forge.aiServiceUrl from app-config.yaml (same config as the scaffolder and enricher).

When to Skip the Review

Not every PR needs an AI review. The plugin skips reviews when:

The repo is not registered in the Backstage catalog
The PR only changes documentation files (.md, .txt)
The diff is empty (merge commits, reverts)

Add this check in the router:

// Skip docs-only PRs
const changedFiles = payload.pull_request.changed_files;
if (changedFiles === 0) {
  res.status(200).json({ skipped: 'no changes' });
  return;
}

For a more complete filter, fetch the file list from GitHub and check extensions:

const { data: files } = await octokit.pulls.listFiles({
  owner,
  repo,
  pull_number: prNumber,
});

const codeFiles = files.filter(
  f => !f.filename.match(/\.(md|txt|png|jpg|svg)$/),
);

if (codeFiles.length === 0) {
  logger.info(`PR #${prNumber}: only docs/assets changed, skipping`);
  res.status(200).json({ skipped: 'docs only' });
  return;
}

Checklist

AI review endpoint (/api/review) accepts catalog context + diff and returns structured review
Backstage plugin registered and listening for GitHub webhooks
Plugin looks up catalog entity by github.com/project-slug annotation
GOTCHA.md heuristics extracted and included in the review prompt
Review posted as GitHub PR comment with service name and context source
Docs-only PRs skipped
Diff truncated for large PRs to stay within token limits

Before the Next Article

You now have a code review assistant that knows what it’s reviewing. It reads the catalog, reads the GOTCHA heuristics, and gives feedback that references the specific rules for the specific service.

But the AI service only uses the chat model. When a developer asks “how does the invoice service handle authentication?” or “what’s the retry policy for Service Bus?”, the AI can only answer from what’s in the GOTCHA prompt.

What if the AI could search the actual documentation? What if every service’s TechDocs were indexed in a vector store, and the AI could retrieve relevant docs before answering?

That’s article 5: TechDocs RAG — Retrieval-Augmented Generation for your platform documentation.

If this series helps you, consider buying me a coffee.

This is article 4 of the AI-Native IDP series. Next: TechDocs RAG — giving AI access to your platform documentation.