Agent-skills: The 21 Skills That Teach Your AI to Code Like a Senior

By Devy — yoDEV.dev


Your AI agent doesn’t write tests. It doesn’t create specs. It doesn’t review its own code. Not because it can’t — but because nobody told it it had to.

That’s the problem Addy Osmani — engineering lead at Google Chrome — decided to solve. The result is called agent-skills, an open source framework that already has 33K stars on GitHub and is changing how teams work with AI agents in production.


The problem: agents take the shortcut

When you ask an AI agent to implement a feature, it does exactly that: implements the feature. It doesn’t ask you if you have a spec. It doesn’t write the test before the code. It doesn’t check if the change crosses a security boundary. It doesn’t consider how readable the PR will be to your team.

It takes the shortest path to “done” and declares victory.

This isn’t a bug in the model — it’s a problem of absent process. A senior engineer doesn’t just write code: they explore assumptions, break work into reviewable chunks, choose the boring solution over the clever one, and leave evidence that the result is correct. Those steps don’t appear in the diff, but they’re the difference between reliable software and software that breaks in production.

agent-skills bakes that scaffolding into the agent explicitly.


What is a “skill”

A skill is a Markdown file with frontmatter that gets injected into the agent’s context when the situation calls for it. But pay attention to the key distinction Osmani makes: a skill is not reference documentation. It’s a workflow.

It’s not “everything you should know about testing.” It’s a sequence of steps the agent follows, with checkpoints that produce evidence and defined exit criteria. The difference matters: if you dump a 2,000-word essay on best practices into the context, the agent generates plausible text and skips testing. If you dump a workflow (write the failing test, run it, see how it fails, write the minimum code to pass it, refactor), the agent has something concrete to execute — and you have something concrete to verify.

Process over prose. Workflows over reference. Steps with exit criteria over essays without them.


The 21 skills in the pack

The repository includes 21 skills organized around six phases of the development cycle, with seven slash commands as entry points:

Phase Command What it does
Define /spec Creates a SPEC.md before touching a line of code
Plan /plan Breaks work down into atomic tasks
Build /build Implements in vertical slices with feature flags
Verify /test Strict TDD: Red → Green → Refactor
Review /review Code review on five axes with severity tags
Ship /ship Safe deployment with launch checklist

Also, /code-simplify spans the entire cycle to clean up code generated by AI before it reaches a PR.

Skills activate automatically based on context: if you’re designing an API, api-and-interface-design activates. If you’re building UI, frontend-ui-engineering activates. You don’t need to invoke them manually.


The five core ideas worth stealing

1. Process over prose

We mentioned it already, but it’s the most important idea in the project. If your team has a 200-page handbook, nobody reads it under time pressure. A small set of workflows with checkpoints? Yeah.

2. Anti-rationalization tables

This is the repo’s most original design decision. Each skill includes a table with common excuses an agent (or a tired engineer) uses to skip the workflow, along with a written rebuttal. For example:

  • “This task is too simple to need a spec.” → Acceptance criteria apply either way. Five lines is fine. Zero lines, no.
  • “I’ll write the tests later.” → “Later” is the keyword. Later doesn’t exist. Write the failing test first.
  • “The tests pass, let’s ship it.” → Passing tests are evidence, not proof. Did you check the runtime? Did a human read the diff?

LLMs are great at rationalizing. These tables are pre-written rebuttals to lies the agent hasn’t told yet.

3. Verification is non-negotiable

Each skill ends with concrete evidence: passing tests, clean build, a reviewer who signs off. “Looks right” is never enough. Without evidence, the task isn’t done.

4. Progressive disclosure

Don’t load all 21 skills into context at session start. A meta-skill (using-agent-skills) acts as a router and decides which skill applies to the current task. That way you can have a library of 21 skills in a 5K token slot without saturating the context.

5. Scope discipline

The meta-skill has one non-negotiable rule: touch only what you were asked to touch. Don’t refactor adjacent systems. Don’t delete code you don’t fully understand. Don’t start rewriting a file because you found a TODO.

This seems obvious until you watch an agent decide that fixing a bug requires modernizing three unrelated files. Scope discipline is the single biggest factor in whether an agent’s PR is mergeable or needs to be undone.


The DNA of Google Engineering

The skills are saturated with practices from Software Engineering at Google and Google’s public engineering culture. Some concrete references:

  • Hyrum’s Law in api-and-interface-design — every observable behavior of your API will eventually be depended on
  • The Beyoncé Rule and the test pyramid (80/15/5) in test-driven-development — if you liked it, put a test on it
  • DAMP over DRY in tests — Google’s tests are explicitly more readable than DRY
  • PRs of ~100 lines with Critical/Nit/Optional/FYI tags in code-review-and-quality
  • Chesterton’s Fence in code-simplification — don’t delete something until you understand why it’s there
  • Trunk-based development and atomic commits in git-workflow-and-versioning

None of these are new ideas. The point is that none of them are turned on by default in the agent.


How to install it on Claude Code

The most direct way if you use Claude Code:

/plugin marketplace add addyosmani/agent-skills
/plugin install agent-skills@addy-agent-skills

With that you have the slash commands (/spec, /plan, /build, /test, /review, /ship, /code-simplify) and skills activate automatically based on context.

For Cursor: copy the contents of any SKILL.md to .cursor/rules/.

For Windsurf: the repository includes integration with Windsurf’s Rules Engine — there’s an active PR (#134) that modernizes that integration.

For Gemini CLI: it has its own documented install path in the README.

Without installing anything: clone the repo, navigate to skills/ and copy the relevant SKILL.md content directly into your conversation. Skills are plain Markdown — they work in any tool that accepts a system prompt.


The “steal the idea without installing anything” mode

Osmani describes three usage modes. The third — and the one he most recommends to start with — is using skills as a specification of what it means to code well with AI, without installing anything.

Read code-review-and-quality.md and apply the five-axis framework to your team’s review process. Read test-driven-development.md and use it to settle the next “do we need to write the test first?” with a junior. Read the meta-skill and steal these five points for your own CLAUDE.md:

  1. Explore assumptions before building
  2. Stop and ask when requirements contradict
  3. Push back when appropriate — the agent isn’t a yes machine
  4. Prefer the boring and obvious solution
  5. Touch only what you were asked to touch

That’s a useful engineering culture in five lines.


The sister project: awesome-agent-skills

The community has already adopted the format and extended it. The repository VoltAgent/awesome-agent-skills adds over 1,200 skills contributed by development teams from different tools and frameworks — and it’s also trending this week.

If the official pack of 21 skills doesn’t cover your specific stack (say, skills for working with Supabase, or for Django projects, or for data pipelines with dbt), it’s likely someone has already contributed something in that repo.


Why This Matters Now

The challenge right now isn’t getting the agent to generate code — that already works. The challenge is getting the generated code to be auditable, reproducible, and mergeable without a senior dev having to rewrite half of it.

agent-skills tackles exactly that problem. Not with magical prompts or a bigger model, but with the simplest thing: giving the agent the same workflows that an experienced engineer forces themselves to follow because they learned from the incidents caused by skipping them.

33K stars in just a few weeks since launch suggests a lot of people were waiting for exactly this.


Are you already using some kind of CLAUDE.md or rules file to guide your agent? What workflows were hardest for it to learn to follow? Let us know in the comments.