What Is Context Engineering? Guide for AI Agents

TL;DR: Context engineering is the practice of designing what goes into a model's context window at every step, so an AI agent sees exactly what it needs and nothing that distracts it. It is broader than prompt engineering, which shapes a single instruction. The field has settled on four strategies: write, select, compress, and isolate context. In Heym, those map to four real features: persistent graph memory, RAG over Qdrant, automatic compression at eighty percent of the window, and sub-agent isolation.

Context engineering at a glance. Definition: managing the tokens a model sees during inference, not just the prompt. Driver: agents run many steps and accumulate context, and accuracy drops as the window fills, an effect called context rot. Core strategies: write, select, compress, isolate. Relationship to prompt engineering: it includes prompt engineering as one part. Best fit: any agent that retrieves data, runs tools, or remembers across turns.

What Is Context Engineering?
Where the Term Came From
Context Engineering vs Prompt Engineering
Why Context Engineering Matters for AI Agents
The Four Context Engineering Strategies
How Context Fails: Five Failure Modes
Context Engineering in Practice: A Heym Workflow
Do You Need Context Engineering or a Better Prompt?
How to Apply Context Engineering in Heym
Key Takeaways
FAQ
References

This guide is for builders and engineers who shipped a working agent demo and then watched it get slower, more expensive, and less accurate as real usage piled context into the window. The fix is rarely a cleverer prompt. It is context engineering.

I work on the Heym platform, where we build a visual canvas for AI workflows, and the same lesson keeps coming up. The prompt is the easy part. Managing everything else the model sees is the real job.

If you are still mapping where AI fits into your stack, our AI workflow automation overview is the pillar this article sits under. This guide goes one level deeper, into the discipline that decides whether an agent stays reliable once it leaves the demo.

What Is Context Engineering?

Definition: Context engineering is the practice of designing and managing the full set of tokens a language model sees during inference, including the system prompt, retrieved documents, tool definitions, memory, and prior messages, so the model has the right information for the current step and as little noise as possible.

The key word is everything. A model does not only read your prompt. It reads the system instruction, any documents you retrieved, the definitions of the tools it can call, whatever it remembers from earlier, and the running transcript of the conversation. All of that lands in one finite context window. Context engineering is the work of deciding what earns a place there.

A useful framing comes from the AI community: the context window is like memory in a computer, and the model is the processor. Context engineering is the operating system that decides what to load into that memory at each step. Load too little and the model guesses. Load too much and it loses the thread.

That is why the discipline grew up alongside agents. A single chatbot turn is small and self-contained. An agent runs for many steps, calls tools, reads results, and keeps going. Each step adds tokens, and the window fills fast. Managing that flow is a different skill from writing one good instruction.

Where the Term Came From

The phrase gained traction in mid-2025, when practitioners including Andrej Karpathy argued that "context engineering" described the real work better than "prompt engineering." The point was simple. As soon as you give a model tools, retrieval, and memory, the wording of any one prompt matters less than the system that assembles the model's full view of the task.

The idea was formalized quickly. Anthropic published an engineering guide on effective context engineering for agents, framing it as curating and maintaining the optimal set of tokens during inference. LangChain published a widely cited taxonomy that split the work into four strategies. Vector database and infrastructure vendors followed with their own guides through late 2025 and early 2026.

Key principle: Prompt engineering asks how to phrase one instruction. Context engineering asks what the model should see at all, across every step of a task.

The shift is not a fashion. It tracks a real change in how people build with models, from one-shot prompts to multi-step agents that pull in data and act on it. The vocabulary caught up with the engineering.

Context Engineering vs Prompt Engineering

This is the comparison most people arrive with, so it deserves a clean answer. Prompt engineering and context engineering are not competitors. Prompt engineering is one component inside context engineering. The table below maps the differences.

Dimension	Prompt Engineering	Context Engineering
Scope	One instruction or message	The entire context window across a run
Unit of work	Wording and structure of a prompt	Retrieval, memory, tools, compression, ordering
When it happens	Before a single call	Continuously, at every step of an agent
Main lever	How you phrase the request	What information the model can see
Problem it solves	Vague or off-target output	Overload, missing facts, drift over long tasks
Scales with	Quality of one message	Number of steps, tools, and documents
Typical artifact	A polished prompt template	A pipeline that assembles context per step
Best for	Self-contained single-turn tasks	Agents that retrieve, remember, or run tools

The practical takeaway is about where the difficulty lives. For a one-off task that fits in a single prompt, prompt engineering carries the load. The moment your agent needs private data, runs across many steps, or remembers across sessions, the hard problem becomes which tokens to put in front of the model, and that is context engineering.

So, is prompt engineering dead? No. A clear, well-structured prompt still matters and is the foundation of every context-engineered system. What changed is that wording one prompt is no longer the whole job. It is the first move in a larger game about managing the model's full field of view.

If you want to go deeper on the prompt-sequencing side of this, our guide to prompt chaining covers how to break work into ordered steps, which is a context-shaping technique in its own right.

Why Context Engineering Matters for AI Agents

The strongest argument for context engineering is that bigger context windows did not solve the problem. Models now advertise windows of hundreds of thousands of tokens, yet stuffing them full makes answers worse, not better. There are three reasons this matters in production.

First, accuracy. As tokens pile up, a model's ability to find and use the right detail declines. A relevant fact buried under thousands of low-value tokens often gets missed, even though it is technically in the window.

Second, cost. Every token in the window is a token you pay for on each call. An agent that drags its full history through twenty steps pays for that history twenty times. Trimming context is a direct line item, not a nicety.

Third, latency. Larger inputs take longer to process. An agent that keeps a tight window responds faster, which matters when a human or another system is waiting on the result.

Notable fact: A larger context window does not guarantee better answers. Model recall and reasoning degrade as the window fills with low-value tokens, so what you leave out matters as much as what you put in.

This is why agent reliability and context engineering are the same conversation. Memory, retrieval, and multi-agent design — most of the agentic design patterns, in fact — exist to keep the window focused. For the memory side specifically, our guide to AI agent memory covers the patterns that let an agent recall facts without carrying its whole past in every prompt.

The Four Context Engineering Strategies

The most useful taxonomy, popularized by LangChain and echoed across the field, splits context engineering into four moves. Most real systems use all four together.

Write context: Persist information outside the context window so it survives across steps and sessions, then bring it back only when relevant. Examples are scratchpads and long-term memory.

Write is about not losing what matters. Instead of keeping every fact in the live transcript, the agent records it to durable storage. A note-taking scratchpad handles within-task state. A long-term memory store handles facts that should outlive the session.

Select context: Pull only the relevant information into the window at the moment it is needed, usually through retrieval over a vector store.

Select is the retrieval half of the loop. Rather than paste a whole knowledge base into the prompt, the system embeds the query, searches for the closest matches, and injects just those. This is the core idea behind retrieval augmented generation. Our RAG pipeline guide walks through building this end to end.

Compress context: Reduce the running history to its essential tokens, usually by summarizing older turns, so the task still fits the budget.

Compress is what keeps long-running agents alive. When the conversation grows past a threshold, the system summarizes the middle and keeps the anchors, so the window stays focused without throwing away the thread.

Isolate context: Split work across separate context windows, such as sub-agents, so different tasks do not pollute one another's view.

Isolate is divide and conquer for context. A research sub-task and a writing sub-task each get a clean window, which prevents the clutter of one from confusing the other. Our multi-agent AI systems guide covers the orchestration patterns this enables.

How Context Fails: Five Failure Modes

Context engineering exists because context degrades in predictable ways. Naming the failure modes makes them easier to design against. The table lists the five that show up most often, with the strategy that addresses each.

Failure mode	What goes wrong	Primary fix
Context poisoning	A hallucination or wrong fact enters the context and gets reused as if true	Select (verified retrieval), validation
Context distraction	So much context accumulates that the model leans on it instead of reasoning	Compress, isolate
Context confusion	Irrelevant or superfluous detail in the window skews the response	Select, tighter system prompt
Context clash	Contradictory information sits in the window at once	Compress, isolate, source ranking
Context rot	Accuracy declines as the token count rises, regardless of relevance	Compress, write, isolate

Context rot is the umbrella problem and the one worth internalizing. It says that more tokens are not free even when they are relevant, because the model's recall weakens as the window grows. The other four modes are specific ways the window fills with tokens that actively mislead. Each strategy from the previous section is, in effect, a defense against one or more of these.

Context Engineering in Practice: A Heym Workflow

Frameworks are easier to trust when a real platform implements them. In Heym, the four strategies are not abstractions, they are features you toggle on a node. Here is how each one maps to something you can actually run.

Write maps to persistent graph memory. Turn on persistentMemoryEnabled on the LLM node and the agent keeps a per-node memory graph. After each successful run, a background extraction call pulls out entities and relationships and stores them as nodes and edges. On the next run, that graph is injected as markdown into the system prompt. Facts persist outside the live window and return only when relevant, which is the write strategy exactly.

Select maps to RAG over a vector store. Heym's vector store runs on Qdrant or on Postgres with the pgvector extension. A RAG node embeds the query, searches the collection, and injects the top matches into the agent's context for that step. The knowledge base never sits in the window wholesale. Only the retrieved passages do, then they leave. That is the select strategy in a single node.

Compress maps to automatic context compression. Agent nodes compress automatically once estimated usage crosses eighty percent of the model's context window. Heym preserves the system prompt, the first user message, and the most recent messages, then summarizes the stretch in between. You do not count tokens by hand. Compression events appear as a labeled entry in the Debug Panel, execution history, and the Traces tab.

Isolate maps to sub-agent orchestration. Set isOrchestrator to true on an Agent node and list subAgentLabels, and the orchestrator gets a call_sub_agent tool to delegate work. Each sub-agent runs with its own clean context window and its own memory graph, and sub-agents run in parallel when called in one turn. Separate concerns get separate context, which is the isolate strategy.

Key principle: A platform that implements all four strategies lets you practice context engineering by configuration rather than by writing token-management code for every agent.

The point is not that Heym invented these strategies. The field did. The point is that you can apply the full taxonomy on one visual canvas, see the effect in traces, and adjust, instead of wiring memory, retrieval, compression, and isolation by hand each time.

Do You Need Context Engineering or a Better Prompt?

Not every problem calls for the full toolkit. Sometimes the honest answer is that your prompt needs work, not your pipeline. Use this four-question check to tell the difference.

Does the task fit in one prompt with no outside data? If yes, prompt engineering is probably enough. Tighten the instruction, add an example, and ship. Reach for context engineering only when one prompt stops being the unit of work.
Does the model need facts it was not trained on, or your private data? If yes, you need the select strategy. Add retrieval so the agent pulls in the right passages at query time rather than guessing or relying on a stale prompt. Whether to retrieve those facts or train them into the model is a decision in its own right.
Does the agent run many steps or need to remember across sessions? If yes, you need write and compress. Persist durable facts to memory and let summarization keep the running window inside budget.
Are there distinct sub-tasks that clutter each other's context? If yes, you need isolate. Split the work across sub-agents so each gets a clean window and the failure modes stay contained.

If you answered no to all four, save yourself the complexity and improve the prompt. If you answered yes to any, that answer points at the exact strategy to start with. Most production agents end up answering yes to two or three, which is why the strategies are designed to compose.

How to Apply Context Engineering in Heym

This recipe matches the HowTo block in the article schema. It turns the four strategies into an order of operations you can follow on the canvas.

Start with the smallest viable system prompt. State the role, the goal, and the hard rules on the Agent node, and stop there. Every extra token competes for attention, so treat the prompt as calibration, not storage.
Select context with RAG over Qdrant. Add a RAG node so the agent retrieves only the relevant passages at query time instead of pasting whole documents into the window.
Write durable facts to persistent memory. Enable persistentMemoryEnabled on the LLM node so the agent records entities and relationships to a graph and recalls them across sessions without carrying the full history.
Let automatic compression handle long runs. Rely on the eighty percent compression threshold on Agent nodes, which keeps the system prompt and recent turns and summarizes the rest.
Isolate sub-tasks across sub-agents. Use isOrchestrator and subAgentLabels so distinct phases run in their own context windows and do not clash.
Observe token usage and iterate in Traces. Read the Traces tab and Debug Panel for token counts, retrieved chunks, and compression events, then tune retrieval limits and where you split sub-agents.

For deeper orchestration once the basic loop is familiar, our LLM orchestration guide covers the coordination patterns the Agent node supports.

Key Takeaways

Context engineering is the practice of managing every token a model sees during inference, including the system prompt, retrieval, memory, tools, and history, not just the prompt itself.
Context engineering vs prompt engineering is scope. Prompt engineering shapes one instruction; context engineering manages the whole window across an agent's run, and includes prompt engineering as one part.
Prompt engineering is not dead. A clear prompt is the foundation, but for multi-step agents the harder work moved to deciding what the model should see at all.
The field's four strategies are write, select, compress, and isolate context, and most real agents use all four together.
Context degrades in named ways, including context rot, where accuracy falls as the window fills, plus poisoning, distraction, confusion, and clash.
In Heym, the four strategies map to four real features: persistent graph memory (write), RAG over Qdrant (select), automatic compression at eighty percent of the window (compress), and sub-agent isolation (isolate).
Use the four-question check to decide whether a task needs context engineering or just a better prompt, then start with the strategy the answer points to.

FAQ

What is context engineering? Context engineering is the practice of designing what information enters a language model's context window at each step, so the model has exactly what it needs to finish the task and nothing that distracts it. It covers the system prompt, retrieved documents, tool definitions, memory, and prior messages. Where prompt engineering shapes one instruction, context engineering manages the whole token budget across a run.

What is the difference between context engineering and prompt engineering? Prompt engineering is about how you phrase a single instruction. Context engineering is about which information the model can see when it answers, across many steps of an agent run. Prompt engineering optimizes one message; context engineering optimizes the entire window, including retrieval, memory, tool definitions, and compression. They are complementary, not rivals.

Is prompt engineering dead? No. Prompt engineering is now one part of context engineering rather than the whole job. A clear instruction still matters, and for a single self-contained task a good prompt is often enough. The shift is that agents run many steps over large windows, so the harder problem moved from wording one prompt to managing all the information the model sees across a run.

Why does context engineering matter for AI agents? Agents run for many steps and accumulate tokens from tool calls, retrieved documents, and prior turns. Past a point, extra context lowers accuracy, raises cost, and increases latency, an effect known as context rot. Context engineering keeps the window focused so the agent stays reliable over long tasks. It is the difference between a demo that works once and an agent that holds up in production.

What are the four context engineering strategies? The widely used taxonomy is write, select, compress, and isolate. Write persists information outside the window, such as long-term memory. Select pulls only the relevant context back in, usually through retrieval. Compress shrinks the running history into summaries to fit the budget. Isolate splits work across separate windows, such as sub-agents, so tasks do not pollute each other.

What is context rot? Context rot is the decline in a model's ability to use information accurately as the number of tokens in the window grows. A larger window does not guarantee better answers, because recall and reasoning degrade when the window fills with low-value tokens. Context engineering fights context rot by keeping only the tokens that earn their place, through retrieval, compression, and isolation.

References

Effective context engineering for AI agents (Anthropic). Engineering guide defining context engineering as curating optimal tokens during inference.
Context Engineering for Agents (LangChain). Source of the write, select, compress, isolate taxonomy.
What is context engineering? (Weaviate). Vendor pillar covering memory, retrieval, and failure modes.
Context engineering (IBM Think). Neutral explainer with enterprise framing.
Prompt engineering (Wikipedia). Definitional background on the prompt-side discipline.
Vector database for AI agents (Qdrant). Vector store used in Heym RAG nodes for the select strategy.

Want to apply context engineering without writing token-management code? Open the Heym dashboard and build an agent on the canvas. Turn on persistent memory, add a RAG node over Qdrant, let compression run at eighty percent of the window, and split work across sub-agents, then watch the effect in Traces.