Decision Notes Field Notes March 4, 2025 7 min read

AI Agent Memory Needs the Same Discipline as Production Code

The failure mode nobody warns you about: AI agent memory that works for a week, then silently drifts. Memory architecture needs the same rigor as production code.

Context

The first time an AI agent forgot something important — a configuration detail, a client decision, an established workflow — the instinct was to blame the model. It should have remembered.

It didn’t, because it couldn’t. AI agents don’t have persistent memory between sessions by default. Every session starts fresh. If you didn’t write something down in a structured, accessible file, it doesn’t exist the next day.

Once that’s understood, the question becomes: what does a production-grade memory system for an AI agent actually look like? The answer turned out to be simpler than expected — and required more discipline than anticipated.

What We Built

The Three-Layer Memory Model

Layer 1: Daily Notes — append-only log of what happened today. Actions taken, decisions made, things discovered. Written to at the end of every session. Never edited retroactively. Think of it as a ship’s log: factual, timestamped, exhaustive. Layer 2: Long-Term Memory — a curated file of what’s always true. Infrastructure config, client context, standing rules, established preferences, tool locations. Edited deliberately when something changes. Read at the start of every session. Kept under a strict size limit — if it grows too large, it gets curated, not padded. Layer 3: Domain-Specific Files — isolated files for each workstream. Project A’s context lives in Project A’s file. Project B’s context lives in Project B’s file. They never touch. The agent loads only the relevant namespace for the task at hand.

The Reference-Not-Secret Rule

Memory files store references to credentials, not credentials themselves. The file says: “SSH password in Keychain as `service-name-ssh-password`.” It doesn’t say what the password is. The agent retrieves it at runtime from the secure store.

This rule sounds obvious. It requires active enforcement. The failure mode is subtle: when you’re moving fast and want to capture context quickly, it’s tempting to paste the actual value into a memory file “just for now.” That’s how credentials end up in places they shouldn’t be.

Milestone and Checkpoint Files

For long-running projects, a milestone file tracks in-flight work, completed work, and next actions. For critical sessions, a checkpoint file captures the full state at a specific moment — what’s been done, what’s open, what decisions were made.

Checkpoints serve a different purpose than daily notes: they’re designed for recovery. If a session ends unexpectedly, or if context needs to be re-established quickly, a checkpoint file contains everything needed to resume without reconstructing from scratch.

What Broke (or Almost Broke)

The Agent That Kept Re-Asking

Before the memory system was formalized, the same questions came up in multiple sessions: What’s the SSH port? Which PHP version does the CLI default to? What’s the WordPress install path?

These weren’t things the agent couldn’t figure out — they were things it had already figured out. The problem was that the answers existed in a conversation transcript, not in a structured file the agent could read on startup.

The result: wasted time, inconsistent behavior, and an agent that felt unreliable — not because it was, but because it had no mechanism to remember.

The fix: every non-obvious discovered fact goes into long-term memory before the session ends. Not “I’ll remember this.” Into the file.

Memory Files That Grew Without Curation

The opposite failure: writing everything into long-term memory without ever pruning it. Within a month, the memory file had grown to contain stale client details, superseded configuration values, and context from resolved situations that no longer applied.

A memory file that’s too long is as problematic as one that’s too short — the agent loads it all into context, diluting signal with noise, and increasing cost without increasing value.

The fix: a strict size limit on long-term memory, with a weekly curation pass. If it’s not still true and still relevant, it gets archived to a historical file, not kept in the active memory.

Sensitive Data That Crept In

Moving fast, it’s easy to capture too much detail. A summary becomes a transcript. A reference becomes a paste. Over time, memory files accumulated:

Near-complete email bodies that should have been one-line summaries
Client meeting notes with more detail than necessary for agent context
Internal system details that shouldn’t be in model-readable files

The fix: a content policy for memory files. Summaries and references only. No raw content. No sensitive data. If something needs to be captured in detail, it goes in a secure document store — and the memory file contains a reference to where it lives.

The Cross-Domain Leak

In the early weeks, working quickly across multiple projects in the same session, context from one client occasionally surfaced in another’s work. Not maliciously — just through proximity in the memory files and insufficient isolation.

The fix: strict namespace separation. Each client, each workstream, each domain has its own file. The agent loads only the relevant file for the task at hand. Cross-domain contamination becomes structurally impossible when the domains don’t share files.

What We Learned

Memory is infrastructure. It needs design, maintenance, and governance — just like any other infrastructure component. An afterthought memory system produces an agent that behaves unpredictably across sessions. The daily notes / long-term memory split is the right architecture. Daily notes capture everything. Long-term memory captures what’s permanently relevant. They serve different purposes and shouldn’t be combined. Curation is as important as capture. A memory file that grows without pruning becomes less useful over time, not more. The discipline of removing stale content is as important as the discipline of adding new content. The reference-not-secret rule needs to be a hard rule. “Just for now” becomes “permanent” faster than you expect. Build the rule before the first exception. Domain isolation prevents a category of failure. When client contexts are physically separated, cross-contamination becomes structurally impossible. Design the separation first.

What We Changed

Established a strict nightly write discipline: every session ends with a memory file update
Set a size ceiling on long-term memory with a weekly curation review
Added a content policy: summaries and references only, no raw sensitive data
Created domain-isolated files for each Project And workstream
Added checkpoint files for critical sessions and long-running projects
Built a standing rule: if it’s not in a file, it doesn’t exist

Takeaways

Design for amnesia. Every session starts fresh unless you built a memory system.
Daily notes + long-term memory + domain files = the right three-layer architecture.
Curation is as important as capture. Prune what’s no longer true.
The reference-not-secret rule is a hard rule. No exceptions for “just for now.”
Domain isolation makes cross-contamination structurally impossible. Build it first.
Memory files need a content policy. Summaries and references. Not raw data.
If it’s not in a file, it doesn’t exist. Write it down before the session ends.

The larger lesson: Memory in production code gets garbage collected, audited, and scoped. Memory in AI agents gets appended to indefinitely, trusted without verification, and mixed across contexts. Treating agent memory as a first-class operational concern — with the same discipline as a production database — is the difference between a reliable agent and an increasingly unreliable one.

Related notes:

When AI Memory Gets Too Complicated
What AI Agents Should Remember