April 17, 2026Ceren Kaya Akgün
AI Agent Memory: Types, Patterns & Implementation
AI agent memory: 3 types explained, architecture patterns, and no-code implementation in Heym's visual canvas.
TL;DR: AI agents forget everything when a session ends because LLMs are stateless. Persistent memory requires three layers: semantic (factual knowledge in a vector store), episodic (past events in a structured log), and procedural (learned behaviors encoded as rules). This guide explains each type and shows you how to implement all three in Heym's visual canvas — no orchestration code required.
Table of Contents
- Why AI Agents Forget
- The 3 Types of AI Agent Memory
- Memory Architecture Patterns
- How to Add Memory to Your Agent in Heym
- Choosing the Right Memory Type
- Common Pitfalls in AI Agent Memory
- Key Takeaways
This guide is for developers and technical teams who have already built or deployed an AI agent and hit the same wall: the agent can't remember anything from the last session. We'll cover why that happens, the three architecturally distinct ways to fix it, and exactly how to implement each one in Heym's visual canvas.
Why AI Agents Forget
The root cause is architectural, not a bug: large language models are stateless. Every call to the OpenAI or Anthropic API is fully independent. The model has no awareness of the previous call — regardless of how closely related the two requests are. The only memory available is whatever you pass in the current context window.
Context windows have grown substantially. Claude 3.7 Sonnet supports 200,000 tokens; GPT-4o supports 128,000. But even 200,000 tokens (~150,000 words of text) runs out faster than expected in production. A typical agentic workflow completing a complex research task in 10–15 reasoning iterations, each injecting 2,000–5,000 tokens of tool call results, can saturate a 128,000-token context window in under 30 minutes of active use.
More importantly, context is ephemeral. When a workflow run ends, every byte of in-context state is discarded. The next run — even 30 seconds later — starts completely fresh. For customer support agents, research agents, or any agent that handles repeat users, this means the user has to re-introduce themselves every single time.
The fix is an external memory layer: a database, vector store, or key-value cache that persists state between runs. The agent reads from it at the start of each session and writes to it at the end. That is the architecture this guide covers.
The 3 Types of AI Agent Memory
AI agent memory is not monolithic. Research and production deployments have converged on three distinct types, each solving a different category of problem. Understanding the boundaries between them saves you from building the wrong storage system for your use case.
| Memory Type | What It Stores | Storage Backend | Retrieval Method |
|---|---|---|---|
| Semantic | General facts, domain knowledge | Vector database (Qdrant) | Similarity search |
| Episodic | Past interactions, user-specific events | Relational database (PostgreSQL) | Filter by user_id + recency |
| Procedural | Learned behaviors, execution rules | Key-value store (Redis) | Key lookup |
Semantic Memory
Semantic memory is a type of AI agent memory that stores structured factual knowledge — general facts, domain rules, and reference information that are independent of any specific past interaction. Unlike conversation history, semantic memory represents what the agent knows about the world, not what it has experienced. It is stored in a vector database and retrieved via similarity search, enabling the agent to surface relevant facts even when the query phrasing differs from the stored content.
Definition: Semantic memory in AI agents stores factual, domain-specific knowledge in a vector database. The agent queries it using embedding-based similarity search — not exact string matching — and retrieves the top-K most relevant records in under 50ms at scales up to 10 million documents. It is the equivalent of long-term factual knowledge in human cognitive memory (IBM, 2026).
Think of it as the agent's reference library: product documentation, domain rules, user preference profiles, regulatory guidelines, or any other information the agent should be able to look up on demand.
Semantic memory lives in a vector database or knowledge graph. The agent queries it using similarity search — not exact string matching — which means it can retrieve relevant facts even when the user's phrasing differs significantly from how the fact was originally stored. Production vector stores such as Qdrant return the top-K most relevant records in under 50ms at scales of 1–10 million documents.
When to use it: Any agent that needs to answer questions from a fixed knowledge base — internal documentation bots, customer support agents with product catalogs, compliance agents referencing regulatory rules, or HR agents answering policy questions.
In Heym: Add a Qdrant Search node and configure it with your collection name and the current user query as the search vector. Heym's built-in Qdrant integration handles embedding and retrieval automatically. The node returns the top-5 relevant documents, which you inject directly into the LLM node's system prompt field. For use cases that require reasoning over relationships between facts — not just similarity — Heym also supports built-in graph-backed memory: enable the Persistent Memory toggle on any LLM node, and Heym automatically extracts entities and their relationships from each conversation in the background, building a knowledge graph that gets injected into the system prompt on subsequent runs. No separate graph database setup required.
Episodic Memory
Episodic memory is a type of AI agent memory that stores a timestamped log of past events and interactions scoped to a specific user or session. Unlike semantic memory, which holds general knowledge, episodic memory captures what happened — conversation summaries, user decisions, resolved tasks, and expressed preferences — indexed by user identity and time.
Definition: Episodic memory in AI agents stores a chronological record of past interactions for a specific user or session. It is retrieved by filtering on
user_idand ordering by recency or importance score. Agents with episodic memory scored 37% higher on user satisfaction in a 2026 benchmark study (Mem0, 2026) because users no longer need to repeat context on every visit.
The canonical schema is straightforward: a table with columns for user_id, session_id, timestamp, event_type, content, and importance_score. At session start, the agent retrieves the last N interactions for the current user — filtered by user_id, ordered by recency or importance_score descending — and injects them into the system prompt as condensed conversation history.
When to use it: Personal assistant agents, customer support agents with repeat users, research agents that build progressively on prior sessions, and any use case where personalization drives the value proposition.
In Heym: Add a Database Query node at the start of your workflow to fetch the last 10 interaction records for the current user_id. After the LLM node completes, add a Database Write node that inserts the conversation summary and any newly surfaced facts into the episodic log. This read-inject-write pattern adds under 200ms total runtime overhead.
Procedural Memory
Procedural memory is the most overlooked of the three types of AI agent memory. Rather than storing facts or events, it encodes learned behaviors — prompt adjustments, tool-use patterns, and operational rules the agent has derived from repeated execution of the same task class.
Definition: Procedural memory in AI agents stores learned execution patterns as versioned rules — which tool to call first, how to handle API failures, which response format to apply for specific task types. It lives in a fast key-value store like Redis (sub-10ms read latency) and updates slowly, after explicit evaluation cycles rather than after every run. Redis reports that production procedural memory stores typically hold 200–2,000 rules per agent and remain stable for weeks between updates (Redis, 2026).
In practice, procedural memory takes three forms: versioned system prompt snippets applied based on task type; tool selection rules ("for financial calculations, always invoke the Python executor before the LLM node"); and failure-handling patterns ("if the HTTP Request node returns a 429 status, insert a 2-second wait before retry"). These rules change slowly — they are updated after explicit evaluation cycles, not after every individual run.
Procedural memory is typically stored in Redis (sub-10ms read latency) or as a versioned JSON document in a simple key-value store. The agent reads the current ruleset at workflow initialization and applies the matching rules to its execution path.
When to use it: High-frequency agents that execute the same task class dozens or hundreds of times per day, and where accumulated operational patterns should improve execution quality over time without full model retraining.
In Heym: Store the procedural ruleset as a JSON document retrieved by a Database Query node at startup. Use a scheduled LLM Evaluator workflow to analyze recent execution traces weekly and propose rule updates. Apply approved updates via a simple trigger workflow that overwrites the ruleset document in your store.
Memory Architecture Patterns
Three patterns cover the large majority of production AI agent memory implementations. The right choice depends on your session model, user volume, and latency tolerance.
Pattern 1: In-Context Storage (Stateless, Single-Session)
All memory lives in the current context window. No external database. Memory resets completely at the end of every run.
Best for: Single-session tasks, one-shot document processing, batch pipelines where every run is fully independent and there is no concept of a returning user.
Additional latency: 0ms — no external reads or writes.
Limitation: Memory is lost the moment the workflow ends. Not suitable for repeat-user scenarios or any application where continuity matters.
Pattern 2: External Storage (Full Persistence)
All persistent state lives in an external store. The agent reads at session start and writes at session end. This pattern supports all three memory types and is the baseline for any agent that handles repeat users.
Best for: Conversational agents, customer support bots, personal assistants, research agents that build on prior sessions.
Additional latency: 50–200ms per run depending on store type — Redis under 10ms, Qdrant 20–50ms, hosted vector services 50–150ms.
Limitation: Requires upfront schema design and a forgetting policy to prevent storage bloat as the user base grows.
Pattern 3: Hybrid Hot-Plus-Cold Memory
Combines in-context short-term memory with external long-term storage. The most recent and most relevant facts stay in-context for the current run (zero additional latency); older or lower-priority memories are retrieved from the external store only when the similarity search identifies them as relevant.
This is the architecture behind Mem0, which reports 91% lower latency versus naive full-context injection — because it avoids loading the entire memory store on every run, injecting only the subset that the current query is likely to need.
Best for: High-frequency agents with large per-user memory stores, where injecting every stored fact would saturate the context window or push token costs to unacceptable levels. The hybrid pattern is also the right choice for multi-agent systems where multiple sub-agents share a common memory store and each only needs a slice of it.
Pattern 4: Graph-Backed Memory (Relational Reasoning)
All three patterns above treat memory as a collection of independent documents or facts retrieved by similarity or key. Graph-backed memory adds a fourth dimension: relationships between facts. Instead of asking "what facts are most similar to this query?", the agent can ask "what facts are connected to this entity through this type of relationship?"
This unlocks multi-hop reasoning that flat vector stores cannot support. Example: "find all compliance rules that apply to this user's industry, given their company type, region, and the product categories they've purchased" — a query that requires traversing three relationship types simultaneously.
In Heym: Enable Persistent Memory in the LLM node settings panel. Heym runs a background LLM extraction pass after every agent run, parsing the conversation to identify entities (people, products, topics, organizations, preferences) and the relationships between them, then storing them as nodes and edges in a per-node knowledge graph. On the next run, Heym automatically injects the relevant subgraph as structured context into the system prompt. You can inspect and edit the live graph at any time via the visual graph editor built into the node — no external graph database configuration required.
Best for: Recommendation agents, compliance agents reasoning over multi-dimensional rule sets, knowledge graph Q&A, research agents building entity maps across sessions, and any domain where how facts relate to each other matters as much as the facts themselves.
How to Add Memory to Your Agent in Heym
Heym's visual canvas provides built-in primitives for all three memory types. The following four steps walk through a complete implementation for a conversational agent with episodic and semantic memory — no orchestration code required.
Step 1: Design Your Memory Schema and Choose a Backend
Before opening the canvas, map each data type to its storage home.
| Data Type | Storage | Heym Node | Retrieval Latency |
|---|---|---|---|
| Current session state | In-memory variable | Variable node | 0ms |
| Long-term user facts | Qdrant | Qdrant Search node | 20–50ms |
| Recent interactions | PostgreSQL table | Database Query node | 5–20ms |
| Procedural rules | Redis or JSON store | MCP Tool node | <10ms |
Choosing the wrong backend for a data type is the most common schema mistake. Episodic logs do not belong in a vector store optimized for similarity search — they belong in a relational table queried by user_id and timestamp. Semantic facts do not belong in a flat key-value store — they need embedding-based retrieval to surface relevant entries when the user's query wording varies.
Step 2: Inject Memory at Session Start
Add a Database Query node as the very first node in your workflow, before the LLM node. Configure it to retrieve the top-10 most relevant records for the current session:
SELECT content, event_type, created_at
FROM agent_memory
WHERE user_id = '{{user_id}}'
ORDER BY importance_score DESC, created_at DESC
LIMIT 10;The query uses two ordering criteria — importance first, then recency — so that high-signal memories from months ago surface above low-signal memories from yesterday. Pass the retrieved records into the LLM node's system prompt using a Heym template expression in the system prompt field. Every session now begins with full context, and users never have to re-introduce themselves.
Step 3: Capture New Facts Mid-Run
After your primary LLM node completes, connect a second lightweight LLM node configured with a fast, cheap model. Give it this extraction prompt:
From the conversation below, extract new facts, preferences, or decisions
worth storing for future sessions.
Return a JSON array. Each item should have:
- "fact": the extracted information as a concise sentence
- "type": "semantic" (general knowledge) or "episodic" (user-specific event)
- "importance": integer from 1 (low) to 5 (high)
Conversation:
{{conversation_transcript}}Connect the extractor node's JSON output to a Database Write node that upserts each extracted fact into your memory store. The entire extractor-writer pair adds under 800ms of latency to the workflow run — well within the acceptable range for conversational use cases where users already expect a 1–3 second response time.
Step 4: Set a Forgetting Policy
An unbounded memory store is an operational risk. For a production agent handling 500 active users, an episodic log with no cleanup grows by approximately 15,000 records per day. Within 6–8 weeks, retrieval latency begins to degrade noticeably, and stale or contradictory facts start polluting new sessions.
In Heym, add a scheduled workflow on a daily or weekly trigger. The cleanup logic runs two queries:
-- Remove memories older than 30 days that were never retrieved
DELETE FROM agent_memory
WHERE created_at < NOW() - INTERVAL '30 days'
AND retrieval_count = 0;
-- Decay importance scores on aging memories
UPDATE agent_memory
SET importance_score = GREATEST(1, importance_score - 1)
WHERE created_at < NOW() - INTERVAL '7 days';The decay query gradually reduces the retrieval rank of aging memories without deleting them immediately — which means a long-term preference (a user always wants responses in bullet points) survives the 30-day cutoff as long as it keeps getting retrieved. Only facts that stop being relevant fall off naturally.
Choosing the Right Memory Type
For most teams building their first memory-enabled agent, start with episodic memory only. It has the lowest schema complexity, the clearest user-facing benefit, and the fastest path to production. Add semantic memory once you have a knowledge base that warrants vector search — typically when the agent needs to answer questions from documentation, catalogs, or policy libraries. Add procedural memory only when the agent executes the same task class at high frequency and you have enough execution trace data to identify consistent behavioral patterns worth encoding.
| Use Case | Recommended Types | Storage Backend |
|---|---|---|
| FAQ bot or documentation assistant | Semantic only | Qdrant |
| Customer support with returning users | Episodic + Semantic | Qdrant |
| Personal AI assistant | All 3 types | Redis (procedural) + PostgreSQL |
| Single-session document processor | None (in-context) | Variable node only |
| Research agent building across sessions | Episodic + Semantic | Qdrant |
| DevOps automation agent | Procedural + Episodic | Redis + PostgreSQL |
For real-world deployment patterns that benefit directly from these memory architectures, see our guide on AI agent use cases — the customer support, research, and personal assistant examples there map directly to the episodic and semantic patterns above.
Common Pitfalls in AI Agent Memory
Pitfall 1: Injecting everything into the context window
Loading all stored memories on every run is the most common first implementation mistake. It saturates the context window faster than expected and increases token costs proportionally with the size of the memory store. Always use relevance scoring and top-K retrieval — never inject more than 10–15 memory records per run without a strong reason.
Pitfall 2: No importance scoring at schema design time
Without an importance_score column in your schema, your retrieval is purely recency-biased — the most recent facts always surface, even when they are far less relevant than older entries. Add importance scoring at schema design time. It is much harder to retrofit later once production data has accumulated without it.
Pitfall 3: Mixing memory types in a single table
Storing semantic facts and episodic events in the same table without type differentiation makes it impossible to apply different retrieval strategies and TTLs to each. Semantic facts should be queried by similarity; episodic events by user ID and recency. Keep them in separate tables or use a well-indexed memory_type column from the start.
Pitfall 4: No forgetting policy from day one
Without expiration rules, memory stores grow without bound. Retrieval latency degrades, stale facts contradict current ones, and storage costs scale linearly with your user base. Implement TTL-based deletion and importance decay before going live — not after you observe degradation in production.
Pitfall 5: Treating memory as an afterthought in multi-agent systems
In a multi-agent system, each sub-agent typically runs its own independent reasoning loop. Without a shared memory store that all agents can read from and write to, agent A's discoveries are invisible to agent B, and the system produces redundant or contradictory results. Design shared memory access patterns — including write-conflict handling — before building multi-agent topologies, not after.
Key Takeaways
- AI agents forget by design: LLMs are stateless — persistent memory requires an external store that the agent reads at session start and writes to at session end.
- Three types, three stores: Semantic memory → vector database (similarity search, <50ms); Episodic memory → relational table (user_id filter, <20ms); Procedural memory → key-value store (rule lookup, <10ms).
- Start with episodic memory: It delivers the highest user-visible improvement — 37% better satisfaction scores — with the simplest schema (a flat events table).
- Hybrid outperforms naive injection: Loading only relevant memories via similarity search reduces latency by up to 91% compared to injecting the entire memory store on every run.
- Forgetting is not optional: For a 500-user production agent, an unbounded episodic log grows ~15,000 records per day. Without a TTL policy, retrieval quality degrades measurably within 6–8 weeks.
- Multi-agent systems need shared memory: Without a common memory store, sub-agents produce redundant or contradictory results — design shared access patterns before building multi-agent topologies.
Conclusion
AI agent memory is a three-layer architecture, not a single technology choice. Semantic memory powers factual knowledge retrieval from a vector store. Episodic memory enables consistent personalization across sessions with a structured interaction log. Procedural memory lets the agent encode learned execution patterns as rules that improve performance without retraining.
None of these layers are complex to implement in isolation. Heym's visual canvas provides Database Query nodes for reads, Database Write nodes for persistence, Variable nodes for in-session state, and MCP Tool nodes for managed memory services — all wired together without writing orchestration code. A working episodic memory layer for a conversational agent takes under two hours to build and adds 50–200ms of latency overhead that users will never notice.
The architectural decisions that matter — what to store, how to retrieve it, and when to forget it — are made before you open the canvas. Get those three right, and the implementation follows directly from the patterns in this guide.
Start with episodic memory. It delivers the clearest user-facing improvement in the shortest time, with the simplest schema. Add semantic and procedural layers once your agent is in production and you understand exactly what it needs to remember.
Try Heym for free and connect your first memory layer in under two hours.

Founding Engineer
Ceren is a founding engineer at Heym, working on AI workflow orchestration and the visual canvas editor. She writes about AI automation, multi-agent systems, and the practitioner experience of building production LLM pipelines.