What is AI agent memory?

AI agent memory is the mechanism that allows an AI agent to store, retrieve, and update information beyond a single context window. Without memory, every agent session starts from zero — no knowledge of prior conversations, past decisions, or learned user preferences. Memory systems classify into three types — semantic (factual knowledge), episodic (past events), and procedural (learned behaviors) — each stored and retrieved differently depending on the use case.

What are the 3 types of AI agent memory?

The three types are: (1) Semantic memory — structured factual knowledge the agent can query, like product catalogs or domain rules, stored in a vector database; (2) Episodic memory — a timestamped log of past events and interactions, used to personalize responses and recall prior conversations; and (3) Procedural memory — learned behavioral patterns, prompt adjustments, and workflow steps the agent has developed through repeated execution.

Why do AI agents forget between sessions?

AI agents forget between sessions because LLMs are stateless by design — each API call is independent with no awareness of prior calls. The context window (up to 200,000 tokens for Claude 3.7, or 128,000 for GPT-4o) holds information only for one session. When the session ends, all in-context state is discarded. Persistent memory requires an explicit external storage layer — a database, vector store, or key-value cache — that the agent reads at the start of each session and writes to after each interaction.

How do I add persistent memory to an AI agent?

To add persistent memory: (1) Choose a storage backend — Redis for short-term key-value memory with sub-10ms retrieval, Qdrant for semantic long-term memory with under-50ms similarity search, or a relational database for structured episodic logs; (2) At session start, inject the agent's relevant past context into the system prompt; (3) After each session, extract and store key facts or decisions back to the store; (4) Implement a forgetting policy — TTL-based expiration or relevance scoring — to prevent memory bloat.

Can I implement AI agent memory without writing code?

Yes. Heym's visual canvas provides built-in Variable nodes for in-session memory, Database Query nodes for persistent storage reads and writes, and MCP Tool nodes that connect to memory services like Mem0 with zero configuration. You chain these nodes in the canvas to build a full memory pipeline — inject context at session start, capture new facts mid-run, write them back at session end — without writing any orchestration code.

AI Agent Memory: Types, Patterns & Implementation

TL;DR: AI agents forget everything when a session ends because LLMs are stateless. Persistent memory requires three layers: semantic (factual knowledge in a vector store), episodic (past events in a structured log), and procedural (learned behaviors encoded as rules). This guide explains each type and shows you how to implement all three in Heym's visual canvas — no orchestration code required.

Why AI Agents Forget
The 3 Types of AI Agent Memory
Memory Architecture Patterns
How to Add Memory to Your Agent in Heym
Choosing the Right Memory Type
Common Pitfalls in AI Agent Memory
Key Takeaways

This guide is for developers and technical teams who have already built or deployed an AI agent and hit the same wall: the agent can't remember anything from the last session. We'll cover why that happens, the three architecturally distinct ways to fix it, and exactly how to implement each one in Heym's visual canvas.

Why AI Agents Forget

The root cause is architectural, not a bug: large language models are stateless. Every call to the OpenAI or Anthropic API is fully independent. The model has no awareness of the previous call — regardless of how closely related the two requests are. The only memory available is whatever you pass in the current context window. Deciding what to keep in that window is the broader discipline of context engineering, and durable memory is one of its core strategies.

Context windows have grown substantially. Claude 3.7 Sonnet supports 200,000 tokens; GPT-4o supports 128,000. But even 200,000 tokens (~150,000 words of text) runs out faster than expected in production. A typical agentic workflow completing a complex research task in 10–15 reasoning iterations, each injecting 2,000–5,000 tokens of tool call results, can saturate a 128,000-token context window in under 30 minutes of active use.

More importantly, context is ephemeral. When a workflow run ends, every byte of in-context state is discarded. The next run — even 30 seconds later — starts completely fresh. For customer support agents, research agents, or any agent that handles repeat users, this means the user has to re-introduce themselves every single time.

The fix is an external memory layer: a database, vector store, or key-value cache that persists state between runs. The agent reads from it at the start of each session and writes to it at the end. That is the architecture this guide covers.

The 3 Types of AI Agent Memory

AI agent memory is not monolithic. Research and production deployments have converged on three distinct types, each solving a different category of problem. Understanding the boundaries between them saves you from building the wrong storage system for your use case.

Memory Type	What It Stores	Storage Backend	Retrieval Method
Semantic	General facts, domain knowledge	Vector database (Qdrant)	Similarity search
Episodic	Past interactions, user-specific events	Relational database (PostgreSQL)	Filter by user_id + recency
Procedural	Learned behaviors, execution rules	Key-value store (Redis)	Key lookup

Semantic Memory

Semantic memory is a type of AI agent memory that stores structured factual knowledge — general facts, domain rules, and reference information that are independent of any specific past interaction. Unlike conversation history, semantic memory represents what the agent knows about the world, not what it has experienced. It is stored in a vector database and retrieved via similarity search, enabling the agent to surface relevant facts even when the query phrasing differs from the stored content.

Definition: Semantic memory in AI agents stores factual, domain-specific knowledge in a vector database. The agent queries it using embedding-based similarity search — not exact string matching — and retrieves the top-K most relevant records in under 50ms at scales up to 10 million documents. It is the equivalent of long-term factual knowledge in human cognitive memory (IBM, 2026).

Think of it as the agent's reference library: product documentation, domain rules, user preference profiles, regulatory guidelines, or any other information the agent should be able to look up on demand.

Semantic memory lives in a vector database or knowledge graph. The agent queries it using similarity search — not exact string matching — which means it can retrieve relevant facts even when the user's phrasing differs significantly from how the fact was originally stored. Production vector stores such as Qdrant return the top-K most relevant records in under 50ms at scales of 1–10 million documents.

When to use it: Any agent that needs to answer questions from a fixed knowledge base — internal documentation bots, customer support agents with product catalogs, compliance agents referencing regulatory rules, or HR agents answering policy questions.

In Heym: Add a Qdrant Search node and configure it with your collection name and the current user query as the search vector. Heym's built-in Qdrant integration handles embedding and retrieval automatically. The node returns the top-5 relevant documents, which you inject directly into the LLM node's system prompt field.

For a complete step-by-step walkthrough of this architecture, see how to build a RAG pipeline in Heym. If you are weighing semantic retrieval against retraining a model on the same data, the comparison in RAG vs fine-tuning covers when each approach wins.

For use cases that require reasoning over relationships between facts, not just similarity, Heym also supports built-in graph-backed memory. Enable the Persistent Memory toggle on any LLM node, and Heym automatically extracts entities and their relationships from each conversation in the background. The resulting knowledge graph gets injected into the system prompt on subsequent runs. No separate graph database setup required.

Episodic Memory

Episodic memory is a type of AI agent memory that stores a timestamped log of past events and interactions scoped to a specific user or session. Unlike semantic memory, which holds general knowledge, episodic memory captures what happened — conversation summaries, user decisions, resolved tasks, and expressed preferences — indexed by user identity and time.

Definition: Episodic memory in AI agents stores a chronological record of past interactions for a specific user or session. It is retrieved by filtering on user_id and ordering by recency or importance score. Agents with episodic memory scored 37% higher on user satisfaction in a 2026 benchmark study (Mem0, 2026) because users no longer need to repeat context on every visit.

The canonical schema is straightforward: a table with columns for user_id, session_id, timestamp, event_type, content, and importance_score. At session start, the agent retrieves the last N interactions for the current user — filtered by user_id, ordered by recency or importance_score descending — and injects them into the system prompt as condensed conversation history.

When to use it: Personal assistant agents, customer support agents with repeat users, research agents that build progressively on prior sessions, and any use case where personalization drives the value proposition.

In Heym: Add a Database Query node at the start of your workflow to fetch the last 10 interaction records for the current user_id. After the LLM node completes, add a Database Write node that inserts the conversation summary and any newly surfaced facts into the episodic log. This read-inject-write pattern adds under 200ms total runtime overhead.

Procedural Memory

Procedural memory is the most overlooked of the three types of AI agent memory. Rather than storing facts or events, it encodes learned behaviors — prompt adjustments, tool-use patterns, and operational rules the agent has derived from repeated execution of the same task class.

Definition: Procedural memory in AI agents stores learned execution patterns as versioned rules — which tool to call first, how to handle API failures, which response format to apply for specific task types. It lives in a fast key-value store like Redis (sub-10ms read latency) and updates slowly, after explicit evaluation cycles rather than after every run. Redis reports that production procedural memory stores typically hold 200–2,000 rules per agent and remain stable for weeks between updates (Redis, 2026).

In practice, procedural memory takes three forms: versioned system prompt snippets applied based on task type; tool selection rules ("for financial calculations, always invoke the Python executor before the LLM node"); and failure-handling patterns ("if the HTTP Request node returns a 429 status, insert a 2-second wait before retry"). These rules change slowly — they are updated after explicit evaluation cycles, not after every individual run.

Procedural memory is typically stored in Redis (sub-10ms read latency) or as a versioned JSON document in a simple key-value store. The agent reads the current ruleset at workflow initialization and applies the matching rules to its execution path.

When to use it: High-frequency agents that execute the same task class dozens or hundreds of times per day, and where accumulated operational patterns should improve execution quality over time without full model retraining.

In Heym: Store the procedural ruleset as a JSON document retrieved by a Database Query node at startup. Use a scheduled LLM Evaluator workflow to analyze recent execution traces weekly and propose rule updates. Apply approved updates via a simple trigger workflow that overwrites the ruleset document in your store.

Memory Architecture Patterns

Three patterns cover the large majority of production AI agent memory implementations. The right choice depends on your session model, user volume, and latency tolerance.

Pattern 1: In-Context Storage (Stateless, Single-Session)

All memory lives in the current context window. No external database. Memory resets completely at the end of every run.

Best for: Single-session tasks, one-shot document processing, batch pipelines where every run is fully independent and there is no concept of a returning user.

Additional latency: 0ms — no external reads or writes.

Limitation: Memory is lost the moment the workflow ends. Not suitable for repeat-user scenarios or any application where continuity matters.

Pattern 2: External Storage (Full Persistence)

All persistent state lives in an external store. The agent reads at session start and writes at session end. This pattern supports all three memory types and is the baseline for any agent that handles repeat users.

Best for: Conversational agents, customer support bots, personal assistants, research agents that build on prior sessions.

Additional latency: 50–200ms per run depending on store type — Redis under 10ms, Qdrant 20–50ms, hosted vector services 50–150ms.

Limitation: Requires upfront schema design and a forgetting policy to prevent storage bloat as the user base grows.

Pattern 3: Hybrid Hot-Plus-Cold Memory

Combines in-context short-term memory with external long-term storage. The most recent and most relevant facts stay in-context for the current run (zero additional latency); older or lower-priority memories are retrieved from the external store only when the similarity search identifies them as relevant.

This is the architecture behind Mem0, which reports 91% lower latency versus naive full-context injection — because it avoids loading the entire memory store on every run, injecting only the subset that the current query is likely to need.

Best for: High-frequency agents with large per-user memory stores, where injecting every stored fact would saturate the context window or push token costs to unacceptable levels. The hybrid pattern is also the right choice for multi-agent systems where multiple sub-agents share a common memory store and each only needs a slice of it.

Pattern 4: Graph-Backed Memory (Relational Reasoning)

All three patterns above treat memory as a collection of independent documents or facts retrieved by similarity or key. Graph-backed memory adds a fourth dimension: relationships between facts. Instead of asking "what facts are most similar to this query?", the agent can ask "what facts are connected to this entity through this type of relationship?"

This unlocks multi-hop reasoning that flat vector stores cannot support. Example: "find all compliance rules that apply to this user's industry, given their company type, region, and the product categories they've purchased" — a query that requires traversing three relationship types simultaneously.

In Heym: Enable Persistent Memory in the LLM node settings panel. Heym runs a background LLM extraction pass after every agent run, parsing the conversation to identify entities (people, products, topics, organizations, preferences) and the relationships between them. These get stored as nodes and edges in a per-node knowledge graph.

On the next run, Heym automatically injects the relevant subgraph as structured context into the system prompt. You can inspect and edit the live graph at any time via the visual graph editor built into the node. No external graph database configuration required.

Best for: Recommendation agents, compliance agents reasoning over multi-dimensional rule sets, knowledge graph Q&A, research agents building entity maps across sessions, and any domain where how facts relate to each other matters as much as the facts themselves.

How to Add Memory to Your Agent in Heym

Heym's visual canvas provides built-in primitives for all three memory types. The following four steps walk through a complete implementation for a conversational agent with episodic and semantic memory — no orchestration code required.

Step 1: Design Your Memory Schema and Choose a Backend

Before opening the canvas, map each data type to its storage home.

Data Type	Storage	Heym Node	Retrieval Latency
Current session state	In-memory variable	Variable node	0ms
Long-term user facts	Qdrant	Qdrant Search node	20–50ms
Recent interactions	PostgreSQL table	Database Query node	5–20ms
Procedural rules	Redis or JSON store	MCP Tool node	<10ms

Choosing the wrong backend for a data type is the most common schema mistake. Episodic logs do not belong in a vector store optimized for similarity search — they belong in a relational table queried by user_id and timestamp. Semantic facts do not belong in a flat key-value store — they need embedding-based retrieval to surface relevant entries when the user's query wording varies.

Step 2: Inject Memory at Session Start

Add a Database Query node as the very first node in your workflow, before the LLM node. Configure it to retrieve the top-10 most relevant records for the current session:

SELECT content, event_type, created_at
FROM agent_memory
WHERE user_id = '{{user_id}}'
ORDER BY importance_score DESC, created_at DESC
LIMIT 10;

The query uses two ordering criteria — importance first, then recency — so that high-signal memories from months ago surface above low-signal memories from yesterday. Pass the retrieved records into the LLM node's system prompt using a Heym template expression in the system prompt field. Every session now begins with full context, and users never have to re-introduce themselves.

Step 3: Capture New Facts Mid-Run

After your primary LLM node completes, connect a second lightweight LLM node configured with a fast, cheap model. Give it this extraction prompt:

From the conversation below, extract new facts, preferences, or decisions 
worth storing for future sessions.
 
Return a JSON array. Each item should have:
- "fact": the extracted information as a concise sentence
- "type": "semantic" (general knowledge) or "episodic" (user-specific event)
- "importance": integer from 1 (low) to 5 (high)
 
Conversation:
{{conversation_transcript}}

Connect the extractor node's JSON output to a Database Write node that upserts each extracted fact into your memory store. The entire extractor-writer pair adds under 800ms of latency to the workflow run — well within the acceptable range for conversational use cases where users already expect a 1–3 second response time.

Step 4: Set a Forgetting Policy

An unbounded memory store is an operational risk. For a production agent handling 500 active users, an episodic log with no cleanup grows by approximately 15,000 records per day. Within 6–8 weeks, retrieval latency begins to degrade noticeably, and stale or contradictory facts start polluting new sessions.

In Heym, add a scheduled workflow on a daily or weekly trigger. The cleanup logic runs two queries:

-- Remove memories older than 30 days that were never retrieved
DELETE FROM agent_memory
WHERE created_at < NOW() - INTERVAL '30 days'
  AND retrieval_count = 0;
 
-- Decay importance scores on aging memories
UPDATE agent_memory
SET importance_score = GREATEST(1, importance_score - 1)
WHERE created_at < NOW() - INTERVAL '7 days';

The decay query gradually reduces the retrieval rank of aging memories without deleting them immediately — which means a long-term preference (a user always wants responses in bullet points) survives the 30-day cutoff as long as it keeps getting retrieved. Only facts that stop being relevant fall off naturally.

Choosing the Right Memory Type

For most teams building their first memory-enabled agent, start with episodic memory only. It has the lowest schema complexity, the clearest user-facing benefit, and the fastest path to production.

Add semantic memory once you have a knowledge base that warrants vector search, typically when the agent needs to answer questions from documentation, catalogs, or policy libraries. Add procedural memory only when the agent executes the same task class at high frequency and you have enough execution trace data to identify consistent behavioral patterns worth encoding.

Use Case	Recommended Types	Storage Backend
FAQ bot or documentation assistant	Semantic only	Qdrant
Customer support with returning users	Episodic + Semantic	Qdrant
Personal AI assistant	All 3 types	Redis (procedural) + PostgreSQL
Single-session document processor	None (in-context)	Variable node only
Research agent building across sessions	Episodic + Semantic	Qdrant
DevOps automation agent	Procedural + Episodic	Redis + PostgreSQL

For real-world deployment patterns that benefit directly from these memory architectures, see our guide on AI agent use cases — the customer support, research, and personal assistant examples there map directly to the episodic and semantic patterns above.

Common Pitfalls in AI Agent Memory

Pitfall 1: Injecting everything into the context window

Loading all stored memories on every run is the most common first implementation mistake. It saturates the context window faster than expected and increases token costs proportionally with the size of the memory store. Always use relevance scoring and top-K retrieval — never inject more than 10–15 memory records per run without a strong reason.

Pitfall 2: No importance scoring at schema design time

Without an importance_score column in your schema, your retrieval is purely recency-biased — the most recent facts always surface, even when they are far less relevant than older entries. Add importance scoring at schema design time. It is much harder to retrofit later once production data has accumulated without it.

Pitfall 3: Mixing memory types in a single table

Storing semantic facts and episodic events in the same table without type differentiation makes it impossible to apply different retrieval strategies and TTLs to each. Semantic facts should be queried by similarity; episodic events by user ID and recency. Keep them in separate tables or use a well-indexed memory_type column from the start.

Pitfall 4: No forgetting policy from day one

Without expiration rules, memory stores grow without bound. Retrieval latency degrades, stale facts contradict current ones, and storage costs scale linearly with your user base. Implement TTL-based deletion and importance decay before going live — not after you observe degradation in production.

Pitfall 5: Treating memory as an afterthought in multi-agent systems

In a multi-agent system, each sub-agent typically runs its own independent reasoning loop. Without a shared memory store that all agents can read from and write to, agent A's discoveries are invisible to agent B, and the system produces redundant or contradictory results. Design shared memory access patterns — including write-conflict handling — before building multi-agent topologies, not after.

Key Takeaways

AI agents forget by design: LLMs are stateless — persistent memory requires an external store that the agent reads at session start and writes to at session end.
Three types, three stores: Semantic memory → vector database (similarity search, <50ms); Episodic memory → relational table (user_id filter, <20ms); Procedural memory → key-value store (rule lookup, <10ms).
Start with episodic memory: It delivers the highest user-visible improvement — 37% better satisfaction scores — with the simplest schema (a flat events table).
Hybrid outperforms naive injection: Loading only relevant memories via similarity search reduces latency by up to 91% compared to injecting the entire memory store on every run.
Forgetting is not optional: For a 500-user production agent, an unbounded episodic log grows ~15,000 records per day. Without a TTL policy, retrieval quality degrades measurably within 6–8 weeks.
Multi-agent systems need shared memory: Without a common memory store, sub-agents produce redundant or contradictory results — design shared access patterns before building multi-agent topologies.

Conclusion

AI agent memory is a three-layer architecture, not a single technology choice. Semantic memory powers factual knowledge retrieval from a vector store. Episodic memory enables consistent personalization across sessions with a structured interaction log. Procedural memory lets the agent encode learned execution patterns as rules that improve performance without retraining.

None of these layers are complex to implement in isolation. Heym's visual canvas provides Database Query nodes for reads, Database Write nodes for persistence, Variable nodes for in-session state, and MCP Tool nodes for managed memory services — all wired together without writing orchestration code. A working episodic memory layer for a conversational agent takes under two hours to build and adds 50–200ms of latency overhead that users will never notice.

The architectural decisions that matter — what to store, how to retrieve it, and when to forget it — are made before you open the canvas. Get those three right, and the implementation follows directly from the patterns in this guide.

Start with episodic memory. It delivers the clearest user-facing improvement in the shortest time, with the simplest schema. Add semantic and procedural layers once your agent is in production and you understand exactly what it needs to remember.

Try Heym for free and connect your first memory layer in under two hours.

AI Agent Memory: Types, Patterns & Implementation

Table of Contents

Why AI Agents Forget

The 3 Types of AI Agent Memory

Semantic Memory

Episodic Memory

Procedural Memory

Memory Architecture Patterns

Pattern 1: In-Context Storage (Stateless, Single-Session)

Pattern 2: External Storage (Full Persistence)

Pattern 3: Hybrid Hot-Plus-Cold Memory

Pattern 4: Graph-Backed Memory (Relational Reasoning)

How to Add Memory to Your Agent in Heym

Step 1: Design Your Memory Schema and Choose a Backend

Step 2: Inject Memory at Session Start

Step 3: Capture New Facts Mid-Run

Step 4: Set a Forgetting Policy

Choosing the Right Memory Type

Common Pitfalls in AI Agent Memory

Key Takeaways

Conclusion

A chatbot is not
a workflow system.

The argument

What breaks first

What heym gives you

Enjoyed this post? Get the next one in your inbox.

Table of Contents

Why AI Agents Forget

The 3 Types of AI Agent Memory

Semantic Memory

Episodic Memory

Procedural Memory

Memory Architecture Patterns

Pattern 1: In-Context Storage (Stateless, Single-Session)

Pattern 2: External Storage (Full Persistence)

Pattern 3: Hybrid Hot-Plus-Cold Memory

Pattern 4: Graph-Backed Memory (Relational Reasoning)

How to Add Memory to Your Agent in Heym

Step 1: Design Your Memory Schema and Choose a Backend

Step 2: Inject Memory at Session Start

Step 3: Capture New Facts Mid-Run

Step 4: Set a Forgetting Policy

Choosing the Right Memory Type

Common Pitfalls in AI Agent Memory

Key Takeaways

Conclusion

A chatbot is nota workflow system.

The argument

What breaks first

What heym gives you

Enjoyed this post? Get the next one in your inbox.

A chatbot is not
a workflow system.