Not Everything Is a Skill¶
Update (2026-02-11)
As of v0.4.0, ctx consolidated sessions into the journal mechanism.
References to /ctx-save, .context/sessions/, and session auto-save
in this post reflect the architecture at the time of writing.

What a Codebase Audit Taught Me About Restraint¶
Jose Alekhinne / 2026-02-08
When you find a useful prompt, what do you do with it?
My instinct was to make it a skill.
I had just spent three posts explaining how to build skills that work. Naturally, the hammer wanted nails.
Then I looked at what I was holding and realized: this is not a nail.
The Audit¶
I wanted to understand how I use ctx:
- where the friction is,
- what works, what drifts,
- what I keep doing manually that could be automated.
So I wrote a prompt that spawned eight agents to analyze the codebase from different angles:
| Agent | Analysis |
|---|---|
| 1 | Extractable patterns from session history |
| 2 | Documentation drift (godoc, inline comments) |
| 3 | Maintainability (large functions, misplaced code) |
| 4 | Security review (CLI-specific surface) |
| 5 | Blog theme discovery |
| 6 | Roadmap and value opportunities |
| 7 | User-facing documentation gaps |
| 8 | Agent team strategies for future sessions |
The prompt was specific:
- read-only agents,
- structured output format,
- concrete file references,
- ranked recommendations.
It ran for about 20 minutes and produced eight Markdown reports.
The reports were good: Not perfect, but actionable.
What mattered was not the speed. It was that the work could be explored without committing to any single outcome.
They surfaced a stale doc.go referencing a subcommand that was
never built.
They found 311 build-then-test sequences I could reduce
to a single make check.
They identified that 42% of my sessions start with "do you remember?", which is a lot of repetition for something a skill could handle.
I had findings. I had recommendations. I had the instinct to automate.
And then... I stopped.
The Question¶
The natural next step was to wrap the audit prompt as /ctx-audit:
a skill you invoke periodically to get a health check. It fits the
pattern. It has a clear trigger. It produces structured output.
But I had just spent a week writing about what makes skills work, and the criteria I established argued against it.
From The Anatomy of a Skill That Works:
"A skill without boundaries is just a suggestion."
From You Can't Import Expertise:
"Frameworks travel, expertise doesn't."
From Skills That Fight the Platform:
"You are the guest, not the host."
The audit prompt fails all three tests:
| Criterion | Audit prompt | Good skill |
|---|---|---|
| Frequency | Quarterly, maybe | Daily or weekly |
| Stability | Tweaked every time | Consistent invocation |
| Scope | Bespoke, 8 parallel agents | Single focused action |
| Trigger | "I feel like auditing" | Clear, repeatable event |
Skills are contracts. Contracts need stable terms.
A prompt I will rewrite every time I use it is not a contract. It is a conversation starter.
Recipes vs Skills¶
The distinction that emerged:
| Skill | Recipe | |
|---|---|---|
| Invocation | /slash-command |
Copy-paste from a doc |
| Frequency | High (daily, weekly) | Low (quarterly, ad hoc) |
| Stability | Fixed contract | Adapted each time |
| Scope | One focused action | Multi-step orchestration |
| Audience | The agent | The human (who then prompts) |
| Lives in | .claude/skills/ |
hack/ or docs/ |
| Attention cost | Loaded into context on match | Zero until needed |
Recipes can later graduate into skills, but only after repetition proves stability.
That last row matters. Skills consume the attention budget every time the platform considers activating them. A skill that triggers quarterly but gets evaluated on every prompt is pure waste: attention spent on something that will say "When NOT to Use: now" 99% of the time.
Recipes have zero attention cost. They sit in a Markdown file until a human decides to use them. The human provides the judgment about timing. The prompt provides the structure.
The Attention Budget Applies to Skills Too
Every skill in .claude/skills/ is a standing claim on the
context window. The platform evaluates skill descriptions
against every user prompt to decide whether to activate.
Twenty focused skills are fine. Thirty might be fine. But each one added reduces the headroom available for actual work.
Recipes are skills that opted out of the attention tax.
What the Audit Actually Produced¶
The audit was not wasted. It was a planning exercise that generated concrete tasks:
| Finding | Action |
|---|---|
| 42% of sessions start with memory check | Task: /ctx-remember skill (this one is a skill; it is daily) |
| Auto-save stubs are empty | Task: enhance /ctx-save with richer summaries |
| 311 raw build-test sequences | Task: make check target |
Stale recall/doc.go lists nonexistent serve |
Task: fix the doc.go |
| 120 commit sequences disconnected from context | Task: /ctx-commit workflow |
Some findings became skills. Some became Makefile targets. Some
became one-line doc fixes.
The audit did not prescribe the artifact type. The findings did.
The audit is the input. Skills are one possible output. Not the only one.
The Audit Prompt¶
Here is the exact prompt I used, for those who are curious.
This is not a template: It worked because it was written against this codebase, at this moment, with specific goals in mind.
I want you to create an agent team to audit this codebase. Save each report as
a separate Markdown file under `./ideas/` (or another directory if you prefer).
Use read-only agents (subagent_type: Explore) for all analyses. No code changes.
For each report, use this structure:
- Executive Summary (2-3 sentences + severity table)
- Findings (grouped, with file:line references)
- Ranked Recommendations (high/medium/low priority)
- Methodology (what was examined, how)
Keep reports actionable. Every finding should suggest a concrete fix or next step.
## Analyses to Run
### 1. Extractable Patterns (session mining)
Search session JSONL files, journal entries, and task archives for repetitive
multi-step workflows. Count frequency of bash command sequences, slash command
usage, and recurring user prompts. Identify patterns that could become skills
or scripts. Cross-reference with existing skills to find coverage gaps.
Output: ranked list of automation opportunities with frequency data.
### 2. Documentation Drift (godoc + inline)
Compare every doc.go against its package's actual exports and behavior. Check
inline godoc comments on exported functions against their implementations.
Scan for stale TODO/FIXME/HACK comments. Check that package-level comments match
package names.
Output: drift items ranked by severity with exact file:line references.
### 3. Maintainability
Look for:
- functions longer than 80 lines with clear split points
- switch blocks with more than 5 cases that could be table-driven
- inline comments like "step 1", "step 2" that indicate a block wants to be a function
- files longer than 400 lines
- flat packages that could benefit from sub-packages
- functions that appear misplaced in their file
Do NOT flag things that are fine as-is just because they could theoretically
be different.
Output: concrete refactoring suggestions, not style nitpicks.
### 4. Security Review
This is a CLI app. Focus on CLI-relevant attack surface, not web OWASP:
- file path traversal
- command injection
- symlink following when writing to `.context/`
- permission handling
- sensitive data in outputs
Output: findings with severity ratings and plausible exploit scenarios.
### 5. Blog Theme Discovery
Read existing blog posts for style and narrative voice. Analyze git history,
recent session discussions, and DECISIONS.md for story arcs worth writing about.
Suggest 3-5 blog post themes with:
- title
- angle
- target audience
- key commits or sessions to reference
- a 2-sentence pitch
Prioritize themes that build a coherent narrative across posts.
### 6. Roadmap and Value Opportunities
Based on current features, recent momentum, and gaps found in other analyses,
identify the highest-value improvements. Consider user-facing features,
developer experience, integration opportunities, and low-hanging fruit.
Output: prioritized list with rough effort and impact estimates.
### 7. User-Facing Documentation
Evaluate README, help text, and user docs. Suggest improvements structured as
use-case pages: the problem, how ctx solves it, a typical workflow, and gotchas.
Identify gaps where a user would get stuck without reading source code.
Output: documentation gaps with suggested page outlines.
### 8. Agent Team Strategies
Based on the codebase structure, suggest 2-3 agent team configurations for
upcoming work sessions. For each, include:
- team composition (roles and agent types)
- task distribution strategy
- coordination approach
- the kinds of work it suits
Avoid Generic Advice
Suggestions that are not grounded in a project's actual structure, history, and workflows are worse than useless:
They create false confidence.
If an analysis cannot point to concrete files, commits, sessions, or patterns, it should say "no finding" instead of inventing best practices.
The Deeper Pattern¶
This is part of a pattern I keep rediscovering: the urge to automate is not the same as the need to automate:
- The 3:1 ratio taught me that not every session should be a YOLO sprint.
- The E/A/R framework taught me that not every template is worth importing. Now the audit is teaching me that not every useful prompt is worth institutionalizing.
The common thread is restraint: Knowing when to stop. Recognizing that the cost of automation is not just the effort to build it. It is the ongoing attention tax of maintaining it, the context it consumes, and the false confidence it creates when it drifts.
A recipe in hack/codebase-audit.md is honest about what it is:
A prompt I wrote once, improved once, and will adapt again next time:
- It does not pretend to be a reliable contract.
- It does not claim attention budget.
- It does not drift silently.
The Automation Instinct
When you find a useful prompt, the instinct is to institutionalize it. Resist.
Ask first: will I use this the same way next time?
If yes, it is a skill. If no, it is a recipe. If you are not sure, it is a recipe until proven otherwise.
This Mindset In the Context of ctx¶
ctx is a tool that gives AI agents persistent memory. Its purpose
is automation: reducing the friction of context loading, session
recall, decision tracking.
But automation has boundaries, and knowing where those boundaries are is as important as pushing them forward.
The skills system is for high-frequency, stable workflows.
The recipes, the journal entries, the session dumps in
.context/sessions/: those are for everything else.
Not everything needs to be a slash command. Some things are better as Markdown files you read when you need them.
The goal of ctx is not to automate everything: It is to automate
the right things and to make the rest easy to find when you need it.
If you remember one thing from this post...
The best automation decision is sometimes not to automate.
A recipe in a Markdown file costs nothing until you use it. A skill costs attention on every prompt, whether it fires or not.
Automate the daily. Document the periodic. Forget the rest.
This post was written during the session that produced the codebase
audit reports and distilled the prompt into hack/codebase-audit.md.
The audit generated seven tasks, one Makefile target, and zero new
skills. The meta continues.