DietrichGebert/ponytail: a YAGNI ruleset that makes your coding agent write less code

Ponytail is a behavior patch for AI coding agents. It does not add a capability, connect a tool, or run a model. It installs one rule: before writing code, stop at the first option that already solves the problem, and write the least code that works. The mascot is the senior developer who reads your fifty lines, says nothing, and replaces them with one.

The problem it targets

Left alone, a coding agent over-builds. Ask for a date picker and it reaches for a library, writes a wrapper component, adds a stylesheet, and opens a discussion about time zones. The native <input type="date"> was right there. This is the failure mode ponytail is built around: agents optimise for looking thorough, not for the smallest correct change, and the cost lands on you as tokens, latency, and code you now have to maintain.

The project arrived in mid-June 2026 and climbed GitHub trending almost immediately, past eleven thousand stars within days of creation (as of 2026-06). That spike says the over-engineering problem is widely felt, not that the fix is proven. Treat the age accordingly.

How it works

The whole mechanism is a short decision ladder the agent runs before generating code:

1. Does this need to exist?   → no: skip it (YAGNI)
2. Stdlib does it?            → use it
3. Native platform feature?   → use it
4. Installed dependency?      → use it
5. One line?                  → one line
6. Only then: the minimum that works

The interesting design choice is the carve-out. Ponytail is lazy, not negligent. Trust-boundary input validation, error handling that prevents data loss, security, and accessibility are explicitly never on the chopping block. When it does take a shortcut with a known ceiling, like a global lock or an O(n²) scan, it leaves a ponytail: comment naming the ceiling and the upgrade path, so the deferred work is visible rather than silently lost.

What you actually install

On a skill-capable host the install is a plugin: a compact always-on ruleset injected each turn, a set of skills, two small Node.js lifecycle hooks, and a handful of slash commands. The commands are the part README buries and the part worth knowing:

/ponytail [lite | full | ultra | off] sets intensity. ultra is the maximum-restraint mode; off disables it without uninstalling.
/ponytail-review scans the current diff for over-engineering and hands back a delete-list.
/ponytail-audit runs the same pass over the whole repo, not just the diff.
/ponytail-debt collects the deferred ponytail: shortcuts into a ledger so “later” does not quietly become “never”.

That debt ledger matters for the safety argument: a skill that encourages shortcuts needs a way to track them, and this is it.

Install

Read your host’s line. The plugin form (hooks plus mode switches plus commands) works on Claude Code, Codex, GitHub Copilot CLI, the Pi harness, OpenCode, and Gemini CLI. node must be on your PATH for the Claude Code and Codex hooks.

Claude Code:

/plugin marketplace add DietrichGebert/ponytail
/plugin install ponytail@ponytail

GitHub Copilot CLI:

copilot plugin marketplace add DietrichGebert/ponytail
copilot plugin install ponytail@ponytail

Pi agent harness:

pi install git:github.com/DietrichGebert/ponytail

Gemini CLI:

gemini extensions install https://github.com/DietrichGebert/ponytail

For Codex, add the marketplace with codex plugin marketplace add DietrichGebert/ponytail, then trust its two lifecycle hooks under /hooks. Editors that do not run skills (Cursor, Windsurf, Cline, Kiro, Antigravity, Aider, the VS Code Codex extension) use the instruction-only path: copy the matching rules file from the repo, such as .cursor/rules/, .windsurf/rules/, .clinerules/, or the shipped AGENTS.md. That path keeps the always-on ruleset but gives up the commands, hooks, and mode switches. The portability map in docs/agent-portability.md lists which file feeds which agent.

The benchmark, read honestly

Ponytail ships its own benchmark, which is more than most skills in this category do. Five everyday tasks (email validator, debounce, CSV sum, countdown timer, rate limiter) run against three models (Haiku, Sonnet, Opus) and three arms: no skill, the caveman skill, and ponytail. Ten runs per cell, median reported. The headline: 80% to 94% less code, 47% to 77% less cost, and three to six times faster than the no-skill baseline, on every model. You can reproduce it with npx promptfoo eval -c benchmarks/promptfooconfig.yaml.

The honest caveats. These are the author’s own numbers, and the five core tasks are small by design, so the dramatic ratios reflect cases where an unconstrained agent bloats the most. The repo includes production-grade write-ups under benchmarks/results/, which are the more useful read. Reproducibility is the real strength here: the harness is published, so the claim is checkable rather than asserted.

Where it fits and where it does not

Reach for ponytail when your agent reflexively adds dependencies and abstractions, when token cost or review burden is the pain, or when you maintain the output and want less of it. The mode switch helps: run lite on greenfield work and ultra on a codebase you are trying to shrink.

Be cautious when underspecified requests are common. A YAGNI-first agent can read “I might need caching later” as “skip the cache”, and the mitigation is the ponytail: comment plus the debt ledger, not the absence of the risk. Review still matters. The skill biases toward less code, and on a task where you genuinely needed the longer version you have to say so. The carve-outs protect correctness and security, but they do not protect a requirement you left implicit.

How it compares

Ponytail sits in the new category of agent behavior skills. Star counts are as of 2026-06.

Repo	Stars	Language	Angle
ponytail	~11.7k	JavaScript	YAGNI ladder, multi-host plugin, published benchmark
caveman	~73k	JavaScript	Minimalism skill, ponytail’s direct benchmark rival
taste-skill	~44k	Shell	Stops generic output, aims at quality not brevity
agent-skills	~60k	Shell	Broad production engineering skill set

The table holds the most useful context the README leaves out: ponytail is the newcomer here. It tops a single week’s trending while caveman, the skill it benchmarks against, carries several times its total stars. Trending rank measures this week’s attention; the star totals measure who has been trusted over time.

Star history

The curve is a near-vertical launch in June 2026 with no prior history, which is what a viral skill release looks like rather than steady adoption. The shape will tell you more once there is a second month to compare against.

JuliusBrussee/caveman, the minimalism skill ponytail measures itself against.
Leonxlnx/taste-skill, a sibling quality-focused behavior skill.
addyosmani/agent-skills and obra/superpowers for broader skill collections.
multica-ai/andrej-karpathy-skills, another single-ruleset approach to agent behavior.

FAQ

Is the ponytail benchmark real and reproducible? Yes. The harness is in benchmarks/, runs through npx promptfoo eval, and reports the median of ten runs per case across three models. The numbers are the author’s own, so reproduce them yourself rather than taking the headline ratios as independent results.

Does ponytail work with Cursor and Windsurf? Yes, but in instruction-only mode. You copy the rules file (.cursor/rules/, .windsurf/rules/) into the project, which keeps the always-on ruleset but not the slash commands, hooks, or mode switches. Full plugin features need a skill-capable host like Claude Code, Codex, OpenCode, Gemini CLI, or Pi.

Will it skip code I actually need? It can, if you leave the need unstated. Ponytail biases toward the smallest working solution and treats unrequested abstractions as out of scope. It protects validation, data-loss handling, security, and accessibility by design, and it marks shortcuts with ponytail: comments plus a /ponytail-debt ledger, but it does not read your mind. State hard requirements explicitly.

How is it different from caveman or taste-skill? All three are agent behavior skills. Caveman is ponytail’s direct benchmark comparison and shares the minimalism goal; taste-skill targets generic, boring output rather than line count. Ponytail’s distinguishing features are the explicit YAGNI ladder, the lite/full/ultra/off modes, multi-host portability across 13 agents, and a published reproducible benchmark.

Is it free? Yes, MIT licensed. It runs inside the agent you already pay for and adds no service of its own.

DietrichGebert/ponytail: a YAGNI ruleset that makes your coding agent write less code

Star growth

The problem it targets

How it works

What you actually install

Install

The benchmark, read honestly

Where it fits and where it does not

How it compares

Star history

FAQ

Repository data

DietrichGebert/ponytail: a YAGNI ruleset that makes your coding agent write less code

Star growth

The problem it targets

How it works

What you actually install

Install

The benchmark, read honestly

Where it fits and where it does not

How it compares

Star history

Related repositories

FAQ

Repository data