Building an AI Agent Memory Stack, Part 2: From Recall to Knowledge

The Updated Stack

QMD

Semantic search across real files. Still the backbone for finding things again.

lossless-claw

Context Management

Compacted but expandable conversation history. Old context gets compressed, not thrown away.

Active Memory

new

runtime

Runtime recall during live chat, so useful context can show up before the main reply.

memory-wiki

new

knowledge

Curated durable knowledge, maintained over time instead of re-derived from scratch.

Dreaming

new

maintenance

Background consolidation, promotion, and cleanup between conversations.

Same recall base, three new layers on top.

In part 1 of building the memory stack, I moved away from the wrong kind of AI memory.

The old version was basically “shove more stuff into context and hope the model remembers what matters”, which is not memory. It is just a bigger backpack full of receipts.

So I moved the stack toward pull-based recall instead. QMD handled semantic search across real files. lossless-claw handled long conversation history without throwing old context into the void. That solved retrieval, which was a big step. But it did not really solve memory.

Kody could find old things again, but we were still rediscovering some of the same project truths like goldfish with shell access. Decisions, recurring patterns, operating rules, weird little lessons from debugging sessions. Search could recover them, sure. But they were not really sticking anywhere.

That is what this second layer of work was about. Not a grand reinvention but more like admitting that search is only one part of memory, then adding the missing pieces around it.

Active Memory: Recall That Shows Up On Time

Active Memory is OpenClaw's runtime recall pass. The important bit is when it runs: before the main reply.

That sounds obvious, but it matters a lot. Memory that shows up after the agent has already answered is basically a very smug historian. Useful, maybe, but late.

In my original OpenClaw setup, I had embarrassingly configured most of this, but nothing was not firing. I found out that the plugin was scoped to the wrong agent ID, so it was sitting there looking very official and doing absolutely nothing. A classic modern software experience. Everything is configured. Nothing works. Beautiful.

Once I pointed it at Kody, it started behaving like an actual memory layer instead of decorative JSON. The change was immediately noticeable. I did not have to keep nudging Kody with “go check memory first” or “did we already decide this?”. Relevant context had a chance to appear during the normal reply flow, before the answer was written.

Under the hood, this still sits on top of the same recall base using QMD which is still the search engine. Active Memory is the part that remembers to use search at the right time.

This is the shape that mattered inside my openclaw.json config file:

{
  plugins: {
    entries: {
      "active-memory": {
        enabled: true,
        config: {
          enabled: true,
          agents: ["kody"],
          allowedChatTypes: ["direct"],
          queryMode: "recent",
          promptStyle: "balanced"
        }
      }
    }
  }
}

While testing it in Discord, these slash commands were useful for checking and testing:

/active-memory status
/active-memory on
/active-memory off

Small thing, but it changes the feel of the system. Memory becomes less like a tool you manually reach for and more like something that participates in the conversation. Which is kinda the whole point.

memory-wiki: Giving Long-Term Knowledge a Proper Home

Once runtime recall started working, the next problem became more obvious. Good search still means you keep re-finding the same truths.

That is better than forgetting them, obviously, but it still feels a bit dumb. If the system has recovered the same decision five times, maybe the decision deserves a home instead of being treated like a fresh archaeological discovery every week. That is where memory-wiki comes in.

It gives the stack a maintained knowledge layer. Not just raw logs. Not just search results. Actual pages that can hold durable conclusions, operating rules, project context, and the kind of “please do not make me learn this again” knowledge that agents otherwise keep stepping on.

This was also where I stopped pretending my older karpathy inspired LLM Wiki experiments and OpenClaw's wiki layer should live as separate systems. Two truth systems sounds powerful until you actually run one. Then it mostly becomes ontology drift, duplicate rules, and a lot of extremely confident markdown. I kept the useful ideas and folded them into one live wiki layer.

The main distinction is simple: Evidence and knowledge are not the same thing.

Raw sources, logs, exports, and receipts are evidence. They need provenance and traceability. Maintained wiki pages are knowledge. They need editing, judgement, and occasionally someone saying, “No, this does not deserve a page.”

If you jam both jobs into the same pile, you usually get one of two outcomes:

a junk drawer
a polished lie

Neither is great.

I added memory-wiki in bridge mode so it could sit on top of the existing recall stack instead of replacing it:

{
  memory: {
    backend: "qmd"
  },
  plugins: {
    entries: {
      "memory-wiki": {
        enabled: true,
        config: {
          vaultMode: "bridge",
          bridge: {
            enabled: true,
            readMemoryArtifacts: true,
            indexDreamReports: true,
            indexDailyNotes: true,
            indexMemoryRoot: true,
            followMemoryEvents: true
          },
          search: {
            backend: "shared",
            corpus: "all"
          },
          context: {
            includeCompiledDigestPrompt: false
          }
        }
      }
    }
  }
}

Once it is live, the useful commands are not flashy. They are the boring maintenance ones:

/wiki-status
/wiki-search "openclaw memory architecture"
/wiki-lint

And honestly, that is probably a good sign. The useful parts of a memory system should feel a little boring. If every interaction feels magical, there is a decent chance the system is making stuff up.

Dreaming: Letting OpenClaw Do Background Consolidation

Dreaming is the part people are most likely to misread as me inventing some spooky custom memory ritual.

I did not.

OpenClaw already has this mechanism in memory-core. I mostly enabled it and then stopped trying to make live chat do every job at once.

The implementation is OpenClaw's. It runs memory consolidation in phases: light, REM, and deep. Recent material gets staged, patterns get pulled out, and stronger material can be promoted into longer-term memory. Human-readable reports land in DREAMS.md, while the machine state lives under memory/.dreams/.

What I like about this is that it is not pretending to be search.

QMD and lossless-claw help recover things. Dreaming helps the system do slower background cleanup and promotion between conversations. Different problem, different tool. Revolutionary concept, apparently.

The config to enable it is straightforward:

{
  "plugins": {
    "entries": {
      "memory-core": {
        "config": {
          "dreaming": {
            "enabled": true
          }
        }
      }
    }
  }
}

And if I want to check or toggle it from chat:

/dreaming status
/dreaming on
/dreaming off

This is the bit I like about the shape of the stack now. Live chat can stay live chat. Search can stay search. Background consolidation can happen in the background where it belongs.

Not everything needs to be crammed into the reply path.

What I'm Still Ironing Out

The wiki layer is the bit that still feels messy. I think the architecture is right. I do not think the curation rules are finished. I'm leaning towards skills as the way to codify the workflow for curating the wiki and probably something I'd write about next.

Once you see a wiki layer working, the temptation is to automate the hell out of it. Auto-create pages. Auto-promote half-useful notes. Auto-link every noun that blinks. Very agentic. Very impressive. Very likely to turn into a spam cannon pointed directly at your own memory.

So for now, I still keep a decent amount of this semi-manual. Choosing what gets promoted, what stays as canonical evidence, what should be revised and not duplicated and even what to archive. Slowly but surely hardening the whole workflow but leaning towards my own feeling about a repetitive workflow that should be codified and eventually automated or triggered.

That is the part I am still tightening.

The wiki layer is useful now, but I do not want it running too far ahead of the rules. A bad memory system is worse than no memory system, because at least forgetting is honest. Bad curation gives you confidence with a fake moustache.

So the stack now feels less like one magic memory feature and more like a small crew with proper jobs.

findable

Keeps the useful stuff findable.

Compress

recoverable

Keeps the long tail recoverable without carrying all of it live.

Curate

knowledge

Turns repeated lessons into actual knowledge.

Dream

housekeeping

Does the overnight housekeeping, so live chat does not have to carry the whole damn house on its back.

Not one big memory blob. Four different jobs, each handled by the layer least likely to make a mess.

That is the shape I was trying to get to: not an agent that remembers everything, but a system that knows where each kind of memory belongs.

That feels much closer to actual memory than where I started. Ask me again in a month. I will probably have changed my mind about at least one layer by then.