It’s already happened to you. Session one, you explain your entire authentication architecture to Claude Code. Session two, you’re explaining it again. Your CLAUDE.md started with 20 lines and now has 300 — a monster that dumps itself into context whether you need it or not. After a month of serious use, that file is costing you 22,000+ tokens per session, and most of it is outdated information.
agentmemory is the solution. It’s a local memory layer that runs as an MCP server alongside your agent, silently captures what the agent does, compresses those observations into structured, searchable memories, and injects only the relevant context at the start of each new session.
The benchmark that made it trend this week: with 240 observations, a CLAUDE.md dumps 22,000+ tokens into context. agentmemory uses ~1,900 — a 92% reduction. Same observations. Same history. Just smarter retrieval.
The problem it solves
LLMs are stateless by design. Each session starts from scratch. The workarounds developers built — CLAUDE.md files, .cursorrules, the ritual of paste-your-context-here — share the same flaw: they’re manual, grow without limit, and load everything regardless of relevance.
agentmemory takes a different path. Instead of a flat file, it builds an indexed memory store. It captures decisions, incidents, touched files, failed approaches, and tool results. When a new session starts, it retrieves only what’s relevant to the current task — not everything you’ve ever done.
The difference in practice: if you set up JWT auth on Monday and chose jose over jsonwebtoken for Edge compatibility, your agent knows it by Thursday. You don’t re-explain. You don’t paste. It just knows.
How it works
agentmemory runs as a local server (default: localhost:3111) with a viewer at localhost:3113. Setup is a single command:
npx @agentmemory/agentmemory
For Claude Code specifically, it integrates via hooks — the same mechanism Claude Code uses for pre/post-execution callbacks on tools. Once installed, it automatically captures session cycle events, tool calls, messages, and task results without any changes to your workflow.
The memory engine uses hybrid retrieval: BM25 keyword search + vector embeddings + a knowledge graph layer for relationships between entities. A compression cycle per hour merges duplicate observations, decays stale entries with retention scoring, and generates a clean audit trail. On the LongMemEval-S benchmark, this pipeline achieves 95.2% R@5 — the correct memory appears in the top 5 results 95% of the time.
Vector embeddings run locally using @xenova/transformers, meaning zero external API calls for memory operations and an estimated cost of ~$10/year with cloud models, or free with local inference.
Agent compatibility
This is where agentmemory differs from Claude-specific solutions. It works with any agent that speaks MCP or HTTP — which in 2026 means practically everything:
Claude Code, Cursor, Codex CLI, Gemini CLI, Windsurf, Kilo Code, OpenCode, Cline, Roo Code, Goose, Aider, Hermes, and OpenClaw have confirmed integrations. One server, shared memories across all of them. If you switch tools mid-project, your memory comes with you.
The REST API (/agentmemory/*) allows any agent without native MCP support to still query the memory store via HTTP.
What gets remembered
agentmemory is designed specifically for operational memory — not chat history. The distinction matters.
It captures: architecture decisions, dependency choices and their justification, known bugs and their fixes, deployment constraints, test coverage status, locations of key components, and failed approaches (so your agent doesn’t try the same broken path twice).
The knowledge graph layer extracts entities and relationships from these observations. You can query the graph at /agentmemory/graph or visualize it in the viewer. Temporal edges are supported, so the system doesn’t just know what was decided, but when — useful for tracking how your architecture evolved.
For existing Claude Code users, there’s JSONL import: point agentmemory at your Claude Code transcript files and rehydrate your complete session history — observations, tool usage, timeline — into the memory store.
An honest caveat
One known limitation worth mentioning: the plugin currently sends authentication tokens over plain HTTP (issue #275, open). The default binding to localhost contains this exposure for most use cases, but if you’re exposing the REST API beyond a single machine — for example, in a team setup or behind a reverse proxy — you’ll want TLS and an authentication review before going that route. The project is active and this is on the roadmap.
The highlighted benchmarks (92% token reduction, 95.2% R@5) are measured against agentmemory’s own pipeline and test suite (LongMemEval-S). Competitors like Mem0 publish numbers on different evaluation sets, so direct comparisons aren’t apples-to-apples. Take the numbers as solidly directional, not as a leaderboard between tools.
Getting started
# Start the memory server
npx @agentmemory/agentmemory
# Viewer available at http://localhost:3113
# MCP server available at http://localhost:3111
For Claude Code, the installer handles hook registration automatically, writing the necessary commands to ~/.claude/settings.json. For other agents, check the repo for tool-specific integration guides — most have dedicated plugin folders with setup READMEs.
The project is 100% open source, hit 5,000 GitHub stars in its first two weeks, and has an active contributor community with 654 tests passing at the time of this publication.
→ GitHub: github.com/rohitg00/agentmemory
→ Site: agent-memory.dev
