Context
The first time an AI agent forgot something important — a configuration detail, a client decision, an established workflow — the instinct was to blame the model. It should have remembered.
It didn’t, because it couldn’t. AI agents don’t have persistent memory between sessions by default. Every session starts fresh. If you didn’t write something down in a structured, accessible file, it doesn’t exist the next day.
Once that’s understood, the question becomes: what does a production-grade memory system for an AI agent actually look like? The answer turned out to be simpler than expected — and required more discipline than anticipated.
What We Built
The Three-Layer Memory Model
Layer 1: Daily Notes — append-only log of what happened today. Actions taken, decisions made, things discovered. Written to at the end of every session. Never edited retroactively. Think of it as a ship’s log: factual, timestamped, exhaustive.
Layer 2: Long-Term Memory — a curated file of what’s always true. Infrastructure config, client context, standing rules, established preferences, tool locations. Edited deliberately when something changes. Read at the start of every session. Kept under a strict size limit — if it grows too large, it gets curated, not padded.
Layer 3: Domain-Specific Files — isolated files for each workstream. Client A’s context lives in Client A’s file. Client B’s context lives in Client B’s file. They never touch. The agent loads only the relevant namespace for the task at hand.
The Reference-Not-Secret Rule
Memory files store references to credentials, not credentials themselves. The file says: “SSH password in Keychain as `service-name-ssh-password`.” It doesn’t say what the password is. The agent retrieves it at runtime from the secure store.
This rule sounds obvious. It requires active enforcement. The failure mode is subtle: when you’re moving fast and want to capture context quickly, it’s tempting to paste the actual value into a memory file “just for now.” That’s how credentials end up in places they shouldn’t be.
Milestone and Checkpoint Files
For long-running projects, a milestone file tracks in-flight work, completed work, and next actions. For critical sessions, a checkpoint file captures the full state at a specific moment — what’s been done, what’s open, what decisions were made.
Checkpoints serve a different purpose than daily notes: they’re designed for recovery. If a session ends unexpectedly, or if context needs to be re-established quickly, a checkpoint file contains everything needed to resume without reconstructing from scratch.
What Broke (or Almost Broke)
The Agent That Kept Re-Asking
Before the memory system was formalized, the same questions came up in multiple sessions: What’s the SSH port? Which PHP version does the CLI default to? What’s the WordPress install path?
These weren’t things the agent couldn’t figure out — they were things it had already figured out. The problem was that the answers existed in a conversation transcript, not in a structured file the agent could read on startup.
The result: wasted time, inconsistent behavior, and an agent that felt unreliable — not because it was, but because it had no mechanism to remember.
The fix: every non-obvious discovered fact goes into long-term memory before the session ends. Not “I’ll remember this.” Into the file.
Memory Files That Grew Without Curation
The opposite failure: writing everything into long-term memory without ever pruning it. Within a month, the memory file had grown to contain stale client details, superseded configuration values, and context from resolved situations that no longer applied.
A memory file that’s too long is as problematic as one that’s too short — the agent loads it all into context, diluting signal with noise, and increasing cost without increasing value.
The fix: a strict size limit on long-term memory, with a weekly curation pass. If it’s not still true and still relevant, it gets archived to a historical file, not kept in the active memory.
Sensitive Data That Crept In
Moving fast, it’s easy to capture too much detail. A summary becomes a transcript. A reference becomes a paste. Over time, memory files accumulated:
- Near-complete email bodies that should have been one-line summaries
- Client meeting notes with more detail than necessary for agent context
- Internal system details that shouldn’t be in model-readable files
The fix: a content policy for memory files. Summaries and references only. No raw content. No sensitive data. If something needs to be captured in detail, it goes in a secure document store — and the memory file contains a reference to where it lives.
The Cross-Domain Leak
In the early weeks, working quickly across multiple clients in the same session, context from one client occasionally surfaced in another’s work. Not maliciously — just through proximity in the memory files and insufficient isolation.
The fix: strict namespace separation. Each client, each workstream, each domain has its own file. The agent loads only the relevant file for the task at hand. Cross-domain contamination becomes structurally impossible when the domains don’t share files.
What We Learned
Memory is infrastructure. It needs design, maintenance, and governance — just like any other infrastructure component. An afterthought memory system produces an agent that behaves unpredictably across sessions.
The daily notes / long-term memory split is the right architecture. Daily notes capture everything. Long-term memory captures what’s permanently relevant. They serve different purposes and shouldn’t be combined.
Curation is as important as capture. A memory file that grows without pruning becomes less useful over time, not more. The discipline of removing stale content is as important as the discipline of adding new content.
The reference-not-secret rule needs to be a hard rule. “Just for now” becomes “permanent” faster than you expect. Build the rule before the first exception.
Domain isolation prevents a category of failure. When client contexts are physically separated, cross-contamination becomes structurally impossible. Design the separation first.
What We Changed
- Established a strict nightly write discipline: every session ends with a memory file update
- Set a size ceiling on long-term memory with a weekly curation review
- Added a content policy: summaries and references only, no raw sensitive data
- Created domain-isolated files for each client and workstream
- Added checkpoint files for critical sessions and long-running projects
- Built a standing rule: if it’s not in a file, it doesn’t exist
Takeaways
- Design for amnesia. Every session starts fresh unless you built a memory system.
- Daily notes + long-term memory + domain files = the right three-layer architecture.
- Curation is as important as capture. Prune what’s no longer true.
- The reference-not-secret rule is a hard rule. No exceptions for “just for now.”
- Domain isolation makes cross-contamination structurally impossible. Build it first.
- Memory files need a content policy. Summaries and references. Not raw data.
- If it’s not in a file, it doesn’t exist. Write it down before the session ends.
*AI Field Notes is a series documenting real-world AI deployments, governance experiments, and operational lessons from the field. All environments and details are generalized to protect organizational and individual privacy.*