
Building Persistent AI Agent Memory: What I Replaced and Why It's Better
March 25, 2026 · 9 min read
So, I'll be honest — when I started building my AI agent team on OpenClaw, persistent memory was probably the thing I got most wrong. I went through this whole phase of building a structured knowledge base, convinced that was the answer. It made sense on paper. And then.. it just kinda didn't work.
(If you're not familiar with my agent setup, I wrote about building the team here — this post assumes that context.)
Auto-injected context on every message. Manually curated. Went stale. More maintenance than it saved.
The knowledge already exists — in session logs, notes, conversations. Stop re-filing it. Start searching it.
Context on demand. Full conversation history, compressed but never lost. Zero manual upkeep.
The First Try: A Structured Push-Based Memory
So initially, I added a persistent memory layer called ByteRover — a local tool that maintains a structured context tree of curated markdown files. Projects, decisions, architectural rules, preferences.. basically a wiki your agent can reference.
The integration point was an auto-enrichment hook. Every inbound message to Kody would get a ## ByteRover Context (Auto-Enriched) block prepended — like 5 "relevant" snippets retrieved from the context tree, injected automatically before he even saw the message.
On paper, this is exactly what you'd want. In practice..... well, it was a learning experience.
What I learned from that approach:
- Noise is real. Every single message — even "what's my meeting tomorrow?" — got a fat block of prepended context. Most of it irrelevant. The retrieval couldn't tell that a calendar question doesn't need three paragraphs about my Git branching preferences. Pretty annoying.
- Maintenance is a tax. The context tree had to be manually curated. New project? Write a doc. Changed a decision? Find the old doc, update it. Forgot? Now your agent is confidently using stale information — which is sort of worse than having none at all.
- It's disconnected from actual conversations. The richest context — what we actually discussed — wasn't searchable. I was maintaining a parallel, lossy copy of knowledge that already existed in our chats. Kinda backwards when you think about it.
- Context windows bite. I'd been running Sonnet 4.6 (1M context), so the injection blocks were just noise I could mostly ignore. The moment I wanted to test a tighter model, everything fell apart. Those prepended tokens burn through your context budget before you even get to the real conversation.
After a few weeks I realised I was spending more time maintaining ByteRover than it was saving me. The whole push-based structured KB idea was just.. wrong for what I needed.
The Shift: From Push to Pull
Well, the insight sort of hit me eventually: the knowledge already exists. It's in session logs, memory files, project docs, past conversations. I didn't need to re-file everything into a structured tree. I just needed to search it when needed.
And for conversation continuity — I didn't need to extract and curate. I needed to compress without losing.
So I moved to a pull-based architecture with two pieces: semantic search over real data (QMD) plus lossless conversation compaction (lossless-claw). Way less hand-wavy than it sounds.
QMD: Search What Already Exists
QMD (Query Markdown Documents) is a local CLI by Tobi Lütke (Shopify's CEO, which is a fun detail) that does BM25 + vector search + rerank over indexed local files. I installed it as an OpenClaw skill and pointed it at basically everything that matters:
- Workspace memory files (
~/.openclaw/workspace/memory/) - Agent knowledge files
- Session history exports
- My Obsidian vault — all notes, project docs, blog drafts, everything
- Project files across
~/repos/
The key difference from a push-based KB: QMD is pull-based. It doesn't inject context into every message. Kody searches when Kody needs context — and only then. No more prepended blocks of noise. Just a targeted query when the situation actually calls for it.
And it searches real data — actual session logs, actual markdown files generated from actual work. Not some curated parallel copy that's always slightly out of date.
In practice, Kody might run something like:
qmd search "ByteRover context window issue"
# → returns the exact session log from when I hit the problem, with line referencesOr before drafting this post, Weaver ran:
qmd query "ai agent memory system technical blog"
# → surfaces relevant session notes, past decisions, related contentThat's sort of the whole thing — search what already exists, instead of maintaining a separate system that tries to summarise it.
Lossless Claw: Conversation History That Doesn't Disappear
The second piece is lossless-claw — an OpenClaw plugin that replaces the default sliding-window context compaction with a proper DAG-based summarisation system.
So here's the problem it solves: as sessions grow long, most AI agents just truncate older messages. You lose context. Things said three sessions ago are just.. gone. lossless-claw instead persists every message to a SQLite database, summarises chunks of older messages into nodes, and condenses those summaries into higher-level nodes as they accumulate — forming a directed acyclic graph of your entire conversation history.
The summaries are compact enough to fit in the context window. But the original exchanges are always retrievable. Kody can lcm_grep to search across compacted history, lcm_expand to drill back into any summarised exchange, and lcm_expand_query to answer questions against the full conversation DAG. Pretty rad honestly.
The New Layered Memory
Anyway..... post-ByteRover, Kody's memory sort of looks like this:
Curated long-term memory. Distilled wisdom, key decisions, preferences — the pinned knowledge that persists across everything.
Raw daily session logs. What actually happened, unfiltered. Written automatically as sessions run.
Searchable corpus across all memory files, session history, and project docs. Pull-based — searched on demand, not injected on every message.
Conversation history compression with full retrieval. DAG-based summarisation — nothing is truncated, everything is reachable.
Each sub-agent (Weaver for content, Arch for code, Oracle for research) maintains its own MEMORY.md with domain-specific context.
The only manual piece is layer 1. Layers 2–4 are basically automatic — daily logs get written as sessions happen, QMD indexes whatever's on disk, lossless-claw compresses as conversations grow. I think that's the right balance.
Compare that to ByteRover, where everything was manual. Every piece of knowledge had to be deliberately filed into a structured tree. The new system just uses data that's already being generated through normal work. Way less overhead.
Push vs Pull: Side by Side
What I'm Still Figuring Out
I don't think this is the final form, if I'm being honest. The QMD indexing could probably be smarter about what it prioritises. The line between "agent should search for this" and "agent should just know this" is still super fuzzy to me. And I'm not sure the 5-layer architecture won't need a 6th at some point (or maybe I'll collapse two layers.. who knows).
But the core shift — from push to pull, from curated to organic, from maintaining a parallel knowledge base to just searching the real one — that part I'm pretty confident about. The push-based approach felt productive. All that organising and curating and structuring. But it was really just creating a maintenance burden that went stale the moment I stopped tending it.
I guess the best knowledge management system is the one you don't have to manage.