ATLAS + GOTCHA -- Part 6
Hands-On: A Production CI/CD Pipeline with Security Scanning
The Problem
You built the API in Article 5. It works locally. Tests pass. Dockerfile builds.
Now you need to get it to production. And not just once — every time someone pushes a commit, the code should build, test, scan for vulnerabilities, build a container image, and deploy to AKS. Automatically. With gates that stop bad code from reaching production.
Most teams skip the security part. They build a pipeline that runs tests and deploys. That’s good, but it’s not enough. Container images contain vulnerable packages. Code has dependency vulnerabilities. JWT secrets end up hardcoded in commits. None of this shows up in unit tests.
I’ve worked on regulated infrastructure projects where a pipeline without security gates was not an option. GDPR, financial services, energy sector — all of them require automated security checks before production. Even if your project isn’t regulated, these checks catch real issues, every week.
The challenge: writing an Azure DevOps pipeline with security scanning is not trivial. The YAML gets long. The stages interact. The order matters. Secrets need to be handled carefully.
This is exactly the kind of complex, multi-step task where a structured approach pays off — and where AI can generate most of the YAML if you tell it exactly what you need.
The Solution
If you’ve been following this series, you know ATLAS and GOTCHA. If not, a quick recap: ATLAS is a 5-phase checklist (Architect, Trace, Link, Assemble, Stress-test) that forces you to think through a problem before you touch any tool. GOTCHA is the 6-layer prompt format (Goals, Orchestration, Tools, Context, Heuristics, Args) that translates that thinking into instructions an AI can process consistently. Together, they turn “write me a pipeline” into a precise specification.
We’ll design the pipeline with ATLAS first. Then translate to GOTCHA. Then look at the actual YAML.
The pipeline has five stages:
- Build — restore, compile, run unit tests
- SAST — static analysis (find vulnerable code patterns)
- Image — build Docker image, scan for CVEs, push to Azure Container Registry
- Deploy — rolling update to AKS
- Smoke test — verify the deployment is healthy
Here’s the ATLAS checklist for this pipeline — each letter maps to a phase of the design:
[A] ARCHITECT
Purpose: Automated CI/CD pipeline for users-api.
Build → test → security scan → deploy to AKS on every push to main.
PRs run build + tests only (no deploy).
Constraints:
- Pipeline must fail if any security scan finds HIGH or CRITICAL issues
- Container image must be signed before push to ACR
- No secrets in pipeline YAML — all from Azure DevOps variable groups
- Deploy must be zero-downtime (rolling update, readiness probe gated)
Tech decisions: Azure DevOps, ACR, AKS, Trivy (container scan), dotnet-format
Out of scope: blue/green deployments, performance testing, notification alerts
[T] TRACE
On push to main:
1. Agent checks out code
2. dotnet restore → dotnet build → dotnet test (with coverage)
3. Publish test results and coverage to pipeline
4. dotnet format --verify-no-changes (fail on unformatted code)
5. docker build → tag with build ID + git sha
6. Trivy scan the image → fail if HIGH/CRITICAL CVE found
7. docker push to ACR (only if scan passes)
8. kubectl set image → wait for rollout to complete
9. curl /healthz on the new pods → fail pipeline if not 200
On PR (not main):
Steps 1-3 only. No image build, no deploy.
[L] LINK
| From | To | Auth | Failure mode |
| -------------- | --------------- | ------------------ | -------------------- |
| ADO pipeline | ACR | Service connection | Fail stage, alert |
| ADO pipeline | AKS | Service connection | Fail stage, alert |
| ADO agent | Trivy | Local binary | Fail stage |
| ADO pipeline | Variable group | Azure Key Vault | Pipeline blocked |
[A] ASSEMBLE
Phase 1: Pipeline skeleton (trigger, stages, agent pools)
Phase 2: Build stage (restore, build, test, format check)
Phase 3: Image stage (docker build, Trivy scan, ACR push)
Phase 4: Deploy stage (kubectl rolling update, rollout watch)
Phase 5: Smoke test stage (HTTP health check with retry)
[S] STRESS-TEST
Scenario 1: Bad code
- Push code with unit test failure → pipeline stops at Build stage
- Push unformatted code → pipeline stops at format check
Scenario 2: Vulnerable image
- Base image with known HIGH CVE → pipeline stops at Image stage
- Fix: update base image → pipeline proceeds
Scenario 3: Failed deploy
- New pod fails readiness probe → rollout stops, old pods stay up
- Smoke test fails → pipeline marked failed, but service still running
Scenario 4: Secret hygiene
- No secrets in YAML, all resolved from variable group at runtime
Now we translate the ATLAS design into a GOTCHA prompt — the format the AI needs to generate consistent output. Each section maps to a GOTCHA layer:
=== GOALS (from Architect) ===
Generate an Azure DevOps YAML pipeline for a .NET 10 Web API that:
- Triggers on push to main (full pipeline) and on PRs (build + test only)
- Builds, tests, runs SAST (dotnet format), scans container with Trivy
- Fails immediately if any HIGH or CRITICAL CVE is found
- Builds and pushes Docker image to ACR (tagged: build ID + git sha)
- Deploys rolling update to AKS and waits for rollout to complete
- Runs HTTP smoke test (/healthz) after deploy
=== ORCHESTRATION (from Assemble) ===
Five stages in order, each depends on previous passing:
1. build: restore → build → test → format check → publish test results
2. scan: Trivy scan on local image (before push) → fail on HIGH/CRITICAL
3. image: docker push to ACR (only after scan passes)
4. deploy: kubectl set image + rollout status watch (5 min timeout)
5. smoketest: curl /healthz with 3 retries, 10s between retries
=== TOOLS (from Link) ===
- Azure DevOps YAML pipelines (multi-stage)
- dotnet 10 SDK (ubuntu-latest agent)
- Docker + Azure Container Registry
- Trivy (aquasec/trivy action or install script)
- kubectl + kubelogin (AKS deployment)
- Azure DevOps service connections (ACR + AKS)
- Azure Key Vault linked variable group for secrets
=== CONTEXT (from Trace + Link) ===
- Project: users-api (.NET 10 Web API)
- ACR name: myacr (variable: ACR_NAME)
- AKS cluster: my-cluster (variable: AKS_CLUSTER), namespace: users-api
- K8s deployment name: users-api
- Container name in deployment spec: users-api
- Secrets in variable group: DB_CONNECTION, JWT_SECRET
- Image tag pattern: $(Build.BuildId)-$(Build.SourceVersion | first 8 chars)
=== HEURISTICS ===
DO:
- Use dependsOn between stages to enforce order
- Use condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
for deploy and smoketest stages
- Publish test results using PublishTestResults task (JUnit format)
- Use --exit-code 1 flag in Trivy to fail the pipeline on findings
- Set timeoutInMinutes on deploy stage to prevent hanging rollouts
DON'T:
- Don't hardcode secrets or connection strings in YAML
- Don't push the image before the Trivy scan passes
- Don't use latest tag — always use a specific, traceable tag
- Don't skip the rollout status check (kubectl rollout status blocks until ready)
- Don't ignore test coverage — publish it even if you don't gate on it yet
=== ARGS (from Stress-test) ===
Pool: ubuntu-latest
ACR_NAME: $(acrName) from variable group
AKS_CLUSTER: $(aksCluster) from variable group
AKS_RESOURCE_GROUP: $(aksResourceGroup) from variable group
Trivy severity: HIGH,CRITICAL
Rollout timeout: 5 minutes
Smoke test retries: 3, interval: 10 seconds
Test results format: JUnit
Execute
Here’s the pipeline YAML, generated from that prompt and lightly adjusted (the three things I mentioned in Article 5 happened here too — I’ll call them out as we go).
# azure-pipelines.yml
trigger:
branches:
include:
- main
paths:
exclude:
- "*.md"
- "docs/**"
pr:
branches:
include:
- main
variables:
- group: users-api-secrets # Azure Key Vault linked variable group
- name: IMAGE_TAG
value: "$(Build.BuildId)-$(Build.SourceVersion)"
- name: IMAGE_FULL
value: "$(acrName).azurecr.io/users-api:$(IMAGE_TAG)"
stages:
# ─────────────────────────────────────────
- stage: build
displayName: Build & Test
jobs:
- job: build
displayName: Build, test, format check
pool:
vmImage: ubuntu-latest
steps:
- task: UseDotNet@2
inputs:
version: "10.x"
- script: dotnet restore
displayName: Restore
- script: dotnet build --no-restore --configuration Release
displayName: Build
- script: |
dotnet test --no-build --configuration Release \
--logger "junit;LogFilePath=$(Agent.TempDirectory)/test-results.xml" \
--collect:"XPlat Code Coverage"
displayName: Test
- task: PublishTestResults@2
condition: always()
inputs:
testResultsFormat: JUnit
testResultsFiles: "$(Agent.TempDirectory)/test-results.xml"
- script: dotnet format --verify-no-changes
displayName: Format check
# ─────────────────────────────────────────
- stage: scan
displayName: Security Scan
dependsOn: build
jobs:
- job: trivy
displayName: Trivy container scan
pool:
vmImage: ubuntu-latest
steps:
- task: Docker@2
displayName: Build image (local, not pushed)
inputs:
command: build
Dockerfile: Dockerfile
tags: $(IMAGE_FULL)
- script: |
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh \
| sh -s -- -b /usr/local/bin v0.62.1
trivy image \
--exit-code 1 \
--severity HIGH,CRITICAL \
--no-progress \
$(IMAGE_FULL)
displayName: Scan with Trivy
# ─────────────────────────────────────────
- stage: image
displayName: Push to ACR
dependsOn: scan
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
jobs:
- job: push
displayName: Push image
pool:
vmImage: ubuntu-latest
steps:
- task: Docker@2
displayName: Build and push to ACR
inputs:
command: buildAndPush
containerRegistry: acr-service-connection
repository: users-api
tags: $(IMAGE_TAG)
# ─────────────────────────────────────────
- stage: deploy
displayName: Deploy to AKS
dependsOn: image
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
timeoutInMinutes: 10
jobs:
- deployment: deploy_aks
displayName: Rolling deploy
pool:
vmImage: ubuntu-latest
environment: production
strategy:
runOnce:
deploy:
steps:
- task: KubernetesManifest@1
displayName: Set image
inputs:
action: patch
resourceToPatch: deployment/users-api
namespace: users-api
patch: |
spec:
template:
spec:
containers:
- name: users-api
image: $(acrName).azurecr.io/users-api:$(IMAGE_TAG)
- task: Kubernetes@1
displayName: Wait for rollout
inputs:
command: rollout
arguments: "status deployment/users-api -n users-api --timeout=5m"
# ─────────────────────────────────────────
- stage: smoketest
displayName: Smoke Test
dependsOn: deploy
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
jobs:
- job: healthcheck
displayName: Health check
pool:
vmImage: ubuntu-latest
steps:
- script: |
for i in 1 2 3; do
echo "Attempt $i..."
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
https://$(apiHostname)/healthz)
if [ "$STATUS" = "200" ]; then
echo "Health check passed (HTTP $STATUS)"
exit 0
fi
echo "Got HTTP $STATUS, retrying in 10s..."
sleep 10
done
echo "Health check failed after 3 attempts"
exit 1
displayName: Check /healthz (3 retries)
What We Adjusted
The AI generated a solid pipeline. Three adjustments:
1. The image tag. The AI used $(Build.BuildId) alone. We changed it to $(Build.BuildId)-$(Build.SourceVersion) (first 8 chars of git SHA). This way every image tag is traceable to the exact commit — useful when you need to know what’s running in production.
2. Trivy install script. The AI used a pinned GitHub Action for Trivy, which only works in GitHub Actions, not Azure DevOps. We switched to the install script that works on any Ubuntu agent.
3. The condition on deploy and smoketest. The AI used succeeded() only, which meant deploys would also run on PR pipelines if the branch happened to be named main on the PR side. We added the eq(variables['Build.SourceBranch'], 'refs/heads/main') check to be explicit.
This last adjustment is a good example of why the Heuristics layer in GOTCHA matters — it’s where you write explicit DO/DON’T rules for the AI. We had this in the prompt:
Use condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
for deploy and smoketest stages
And the AI still got it slightly wrong. Not because the heuristic was unclear — but because the AI applied it to the condition on the stage but also left a default succeeded() on the job level. We removed the job-level condition and kept only the stage-level one.
The lesson: review the generated code. A structured prompt doesn’t make AI infallible. It makes the output much closer to what you need, but you’re still the architect. The structured approach compresses the review from hours to minutes — but you still need to look.
Template
Here’s the ATLAS checklist for any CI/CD pipeline project:
=== CI/CD PIPELINE ATLAS CHECKLIST ===
[A] ARCHITECT
Purpose: (what triggers it, what it produces)
Trigger: (branches, events, PR vs merge)
Constraints: (what must fail the pipeline, what must never happen)
Secrets: (where they come from, never YAML)
Out of scope: (what this pipeline doesn't do)
[T] TRACE
On merge to main:
1. (first step)
2.
...
On PR:
(which steps only)
[L] LINK
| From | To | Auth | Failure mode |
[A] ASSEMBLE
Stage 1: (name + what it does)
Stage 2:
...
[S] STRESS-TEST
Scenario 1: What happens when tests fail?
Scenario 2: What happens when a CVE is found?
Scenario 3: What happens when the deploy fails?
Scenario 4: Are there any secrets that could leak?
Challenge
You now have an API and a pipeline. Together, they form a complete system: code → test → scan → deploy → validate.
Before Article 7, try this: take the GOTCHA prompt from this article and change three things — a different container registry, a different cluster, and add a stage you need (maybe a database migration step, or a Slack notification on failure). See how much of the prompt you can reuse and what you need to adjust.
That exercise is a preview of what’s coming. In Article 7, we’ll look at the most common mistakes developers make when scaling this approach across bigger projects — the anti-gotchas — and build a library of ready-to-use prompt templates.
If this series helps you, consider buying me a coffee.
Loading comments...