I Ran My Own DevOps Prompts Against My Portfolio Site

Magnús Smári Smárason·March 16, 2026

devops ai-prompts claude-code cursor coding-agents

I built four DevOps prompts for AI coding agents and ran them against my own portfolio site. The results were honest, useful, and occasionally humbling.

Most prompt libraries give you templates with [FILL IN THE BLANK] placeholders. You paste them into ChatGPT, replace some words, and hope for the best. That's not what this is.

These are DevOps prompts I built for AI coding agents — Claude Code, Cursor, Windsurf, Cline — the tools that can actually read your codebase, run commands, and trace import chains. I've been using them across multiple production projects for months. They started as one-off mission prompts for my own infrastructure, and through iteration, became something I think others can use too.

To prove they work, I did the most honest thing I could think of: I ran all four of them against my own portfolio site. Same codebase you're reading this on.

The Four Prompts

Each prompt follows a phased execution structure with built-in constraints that prevent the AI from doing anything destructive. They're all read-only audits — they analyze, report, and recommend, but never modify your code without asking.

1. Codebase Self-Optimization

The "onboarding" prompt. Point an AI agent at any project and it maps the architecture, discovers conventions, and spots where documentation has drifted from reality.

When I ran it against this site, it produced an annotated directory tree, Mermaid architecture diagrams, a component inventory with confidence ratings (verified/inferred/uncertain), and traced data flows through the entire app. It spot-checked seven documentation claims against actual code — four were stale.

The most useful output: an anti-pattern registry with file paths. Not "you should probably fix this" — but "line 13 of this file does X, the correct pattern is Y." Evidence-backed, not speculative.

AI-readiness score: 8/10. The main gap was documentation freshness — the code was solid, but the docs describing it had drifted.

2. Technical Debt Audit

The systematic debt discovery prompt. Uses the TIME framework (Tolerate/Invest/Migrate/Eliminate) to classify every debt item, scores them by impact and effort, and organizes fixes into executable waves.

This one found 27 scored debt items across my codebase. The most valuable distinction it makes: intentional debt vs. accidental debt. A documented trade-off is different from neglect. When you see a table that says "this was a deliberate choice, tolerate it" next to "this was an oversight, fix it in 15 minutes" — that changes how you prioritize.

It identified 14 files violating the project's own import conventions. It counted my actual tests (257) and pointed out that five different documents claimed five different numbers (251, 252, 254...). It found dead code paths and stale configuration references that had accumulated over months of active development.

Health grade: B-. Quick wins in Wave 1 estimated at ~65 minutes total effort.

3. Repository Deep Clean

The spring cleaning prompt. Forensic file audit — finds dead code, orphaned artifacts, stale documentation, missing community standard files. Uses a three-method verification system: import tracing, config reference checks, and git history analysis. If it can't confirm a file is dead after all three methods, it flags it for human review instead of guessing.

This one discovered that my git history was carrying 31 MB of legacy Python wheel files from a Flask-era migration. 171 .whl files committed before I had proper .gitignore rules. I had no idea they were there — they don't show up in the working tree, but they bloat every git clone.

It also found 14 dead section components, 18+ dead scripts, and 16 MB of unreferenced Midjourney images in the public directory. The before/after projection: cleaning up would remove ~47 MB of tracked artifacts.

Repository hygiene score: 5/10. The architecture is solid, but the accumulated artifacts from a Flask-to-Next.js migration were dragging the score down. The top quick win — git rm -r --cached .cache/ — takes 5 minutes and recovers 31 MB.

4. Documentation Review & Repair

The drift catcher. Every project has documentation that was accurate when it was written but drifted as the code evolved. This prompt finds every instance of that drift by verifying every factual claim against the actual codebase.

It audited 170 documentation files across my project. The findings were organized by severity — Critical, Warning, Info — which makes triage immediate.

The critical find: an environment variable name mismatch. My documentation (in six different files) referenced one env var name. The code actually reads a different name. If someone set up this project from the docs, AI validation in the pipeline would silently fail. That's the kind of bug that's invisible until someone deploys.

It also found phantom references — documentation referencing a middleware.ts file that was renamed to proxy.ts during a Next.js upgrade. Mentioned in 10+ documents. The code works fine, but anyone following the docs would be confused.

Documentation health score: 6/10. Top 5 fixes identified, starting with the wrong env var name.

What I Learned From Eating My Own Dog Food

They cross-validate. The phantom middleware reference was caught by both the Self-Optimization and Documentation Review prompts independently. When two independent audits find the same issue, you can trust it.

The scores are honest. My portfolio site — the one I've been actively developing for months — scored 8/10 on AI-readiness, B- on technical debt, 5/10 on repository hygiene, and 6/10 on documentation health. These aren't flattering numbers. They're accurate. The prompts don't grade on a curve.

Quick wins are the real value. Each prompt produces a "Top 5 Quick Wins" table. Across all four prompts, the quick wins totaled maybe 2-3 hours of work. That's the difference between knowing you have debt and knowing exactly what to fix first.

Documentation drift is real. The Doc Review prompt found that my test count was wrong in five different documents, each with a different number. My route table in the README was missing five routes and had two wrong paths. None of this broke anything — but it means anyone onboarding to the project would be working from slightly wrong mental models. That compounds.

The Iteration Process

These prompts went through two evaluation rounds. The first round produced good output but lacked structure — no scores, no quick-win tables, variable depth. I analyzed what worked and what didn't, then improved each prompt:

Added health scores (1-10 or letter grades) so outputs are comparable
Added "Top 5 Quick Wins" tables for immediate actionability
Added Mermaid diagram requests for architecture visualization
Required showing verification work (not just classifications)
Added severity tiers (Critical/Warning/Info) for the documentation review
Added the intentional vs. accidental distinction for tech debt

The second round was measurably better — tighter, more structured, more actionable. The prompts you'll find on my prompts page are the v2 versions.

Who These Are For

You need an AI coding agent that can access your filesystem. These won't work in vanilla ChatGPT or Claude.ai — they need to read files, run commands, and trace import chains. They're designed for:

Claude Code (what I use)
Cursor (agent mode)
Any agent with filesystem access

If you use one of these tools, try the Codebase Self-Optimization prompt first. It's the foundation — everything else builds on the context it creates.

Get the Prompts

All four prompts are available on my prompts page under the DevOps category. Copy, paste into your agent, and point it at your project. They work with any language, any framework, any stack.

If they find something interesting in your codebase — good. That's the point.