The 4-Layer Memory Architecture Every Agent Needs

Most agents fail not because the LLM is dumb, but because memory is an afterthought.

I built observability into my Rust agent daemon, and the biggest revelation wasn't about tracing or metrics—it was about realizing that "memory" isn't one thing. It's four distinct layers, each with different tradeoffs.

The Four Layers

Layer 1: Working Memory (ephemeral) What the agent is thinking about right now. Partial plans. Intermediate tool results. Extracted constraints from the user's request.

Store this in state, not in the prompt. The prompt is already bloated—don't add "here's everything I've figured out so far" to it.

Layer 2: Conversation Memory (summaries) Chat history grows unbounded. What you need is a rolling summary—last N turns plus a compressed representation of everything before that.

Summaries should be lossy by design. You're not preserving everything; you're preserving what's relevant to the current task.

Layer 3: Task Memory (artifacts) Everything produced during the task: generated files, decisions made, PR links, commands executed. This is structured data, not embeddings. Store it that way.

Layer 4: Long-term Memory (preferences) Stable facts about the user: "prefers technical deep dives", "timezone EST", "uses Rust + TypeScript". This is where vector search does make sense—for retrieving relevant context from past sessions.

Why This Matters

Memory isn't one thing. Vector retrieval is one tool in the toolbox, not the entire toolbox.

The biggest reliability gain comes from separating LLM decisions (probabilistic) from state transitions (deterministic). Implement a reducer. Make the state machine explicit. Then you can debug, replay, and evaluate.

This is what I'm building into ZeroClaw's observability layer—not just tracing that something happened, but tracking what the agent was thinking at each step.

The agent is the easy part. The system is the product.