CAVEMAN: The Claude Code Skill That Went Viral by Making Your AI Talk Like a Caveman
Julius Brussee is a Dutch university student. His father is Arjan Brussee — co-creator of Jazz Jackrabbit, a veteran of video game development. The son inherited the instinct. Last week, Julius published a single-line GitHub repo with the tagline “why use many token when few token do trick” and woke up a few days later to find it trending #1 on GitHub and headlining the Hacker News discussion.
The repo is called caveman. At the time of writing, it’s already surpassed 14,000 stars and counting.
The premise is almost absurdly simple: Claude Code (and most LLM-based code agents) are trained to be verbose and courteous. They say “Sure! I’d be happy to help you with that. The problem you’re experiencing is probably due to…” before getting to the actual answer. Every one of those words costs tokens. Caveman eliminates all of them.
What Caveman Does Exactly
Caveman is a Claude Code skill — a plain text instruction file that changes how Claude formats its prose responses. The core logic is straightforward:
Out: articles (a/an/the), filler words (just, really, basically, actually, simply), polite phrases (sure, certainly, of course, happy to help), hedging language.
In: direct responses. Pattern: [thing] [action] [reason]. [next step].
The critical design decision: technical content passes through unchanged. Code blocks, file paths, URLs, commands, version numbers, error messages — none of that gets compressed. Only the prose explanation around it receives the caveman treatment.
The canonical example from the README:
Before (normal Claude):
“The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle. When you pass an inline object as a prop, React’s shallow comparison sees it as a different object every time, which triggers a re-render. I’d recommend using useMemo to memoize the object.”
After (caveman full):
“New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.”
Same fix. Same information. Fraction of the tokens.
Three Modes (Plus Classical Chinese)
Caveman includes three compression levels for English:
| Mode | What It Does |
|---|---|
/caveman lite |
Removes filler and politeness. Complete sentences, articles preserved. |
/caveman full |
Full caveman. No articles, fragments OK, short synonyms. |
/caveman ultra |
Maximum compression. Abbreviations (DB, auth, fn, impl), arrows for causality (X → Y), one word when one word suffices. |
And yes — the repo also includes three wenyan modes (compressed classical Chinese writing), because apparently caveman speech and 2,500-year-old written Chinese are both extremely token-efficient. wenyan-ultra reports 80-90% character reduction while maintaining all technical meaning.
The Input Side: caveman-compress
The skill handles output compression. But caveman also includes a separate tool — caveman-compress — that handles input compression.
The idea: your CLAUDE.md loads at the start of each Claude Code session. If it’s 1,500 words of verbose instructions, you’re paying token overhead on every session. Caveman-compress rewrites it in compressed caveman-speak, preserving the original as CLAUDE.md.original.md so you can read and edit it without issues.
The repo claims ~45% input token reduction per session with this approach. Your CLAUDE.md becomes fast for Claude to read, while you keep the human-readable version.
The Token Numbers: What to Believe
The repo claims 65-75% output token reduction. Independent benchmarks on real coding tasks measured more modest but genuine savings — around 14-21%. The project includes its own evaluation suite (uv run python evals/llm_run.py) so you can measure with your specific workload.
The honest framing from the README itself: “Caveman no make brain smaller. Caveman make mouth smaller. Biggest win is readability and speed, cost savings are a bonus.”
There’s also actual research backing this. A March 2026 paper titled “Brevity Constraints Reverse Performance Hierarchies in Language Models” found that constraining large models to brief responses improved accuracy by 26 percentage points on certain benchmarks and completely reversed performance rankings between models. In other words: making Claude shut up can make it smarter, not just cheaper.
Installation
As a Claude Code plugin (recommended — includes hooks that activate caveman automatically on session start):
claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@caveman
Or just as a skill (without auto-load hooks):
npx skills add JuliusBrussee/caveman
Once installed, activate it in your session with /caveman, /caveman lite, /caveman full, or /caveman ultra. The hooked version activates automatically.
For Codex users: clone the repo → open Codex in the project → /plugins → search for Caveman → Install.
Why It Went Viral
The mechanics matter less than the broader point caveman exposed: AI verbosity has a real cost, and that cost has been silently absorbed. Most devs don’t monitor token usage per session — it just shows up on the bill. Caveman made it visible by making compression so aggressive it becomes funny.
That it crossed 14,000 stars in less than a week means the pain was real and widely shared. And that a university student published it as a plain text .skill file — not a SaaS product, not a browser extension, not a paid tool — is exactly the kind of thing that makes a good Hacker News post.
The deeper question the discussion opened: are LLMs trained to be verbose because verbosity signals effort to human evaluators during RLHF? If so, caveman is a workaround for a training artifact, not a bug fix. That’s a more interesting conversation than token savings in itself.
Have you tried caveman in your Claude Code sessions yet? How many tokens do you save with your real use case? Share in the comments ![]()