May 4, 2026Ceren Kaya Akgün
Prompt Chaining: A Developer's Practical Guide
Learn what prompt chaining is, the 4 core patterns with code examples, and how to build a visual prompt chain in Heym without boilerplate API code.
TL;DR: Prompt chaining splits a complex task into a sequence of focused LLM calls where each step's output feeds the next. Four patterns cover the vast majority of production use cases: sequential, branching, parallel, and iterative. This guide explains each pattern with code, shows how they differ from chain-of-thought prompting, and walks through building a visual prompt chain in Heym without writing API boilerplate.
Table of Contents
- What Is Prompt Chaining?
- Why Use Prompt Chaining?
- The 4 Core Prompt Chaining Patterns
- Prompt Chaining vs Chain of Thought
- How to Build a Prompt Chain in Heym
- Real-World Prompt Chaining Examples
- Common Mistakes to Avoid
- Key Takeaways
- FAQ
What Is Prompt Chaining?
This guide is for developers building LLM-powered applications who want to move beyond single-prompt designs. Whether you are building a data pipeline, a content generation workflow, or an agentic AI system, prompt chaining is the foundational technique that makes complex reasoning reliable at scale. I work on the core platform at Heym, where we have built prompt chaining support into the visual canvas and observed these patterns across dozens of production workflows.
Definition: Prompt chaining is a technique in which a complex task is decomposed into a sequence of smaller, focused LLM prompts. The output of each step is passed as the input to the next. Rather than asking one model call to handle research, reasoning, synthesis, and formatting simultaneously, each step handles a single well-defined subtask.
The principle is straightforward. Instead of one giant prompt that says "research this topic, analyze the findings, and write a polished report," you build a pipeline. Step 1 extracts raw facts. Step 2 scores those facts by relevance to the target audience. Step 3 drafts the report from the top-ranked facts. Step 4 generates a title and meta description from the draft. Each step is a separate LLM call with its own focused system prompt.
Anthropic describes prompt chaining as one of the foundational patterns for building reliable AI applications, particularly useful when tasks have clear intermediate checkpoints where each step's output can be validated before the chain continues (Anthropic Developer Documentation, 2025).
Prompt chaining is also the building block of LLM orchestration. Most production orchestration systems add tool calls, memory, and conditional routing on top of a chaining foundation. Understanding chaining is the prerequisite for understanding the broader orchestration patterns that make agentic AI systems work.
Why Use Prompt Chaining?
Single-prompt approaches collapse under complexity. Four concrete reasons explain why developers adopt prompt chaining once they hit that wall.
1. Better accuracy on complex tasks. When you ask a single LLM call to handle five distinct phases of work, the model allocates attention across all five simultaneously. Output quality drops on the later phases. Decomposing the same task into five focused steps lets each call give full attention to one subtask. Anthropic's engineering team found that structuring LLM work as chains of discrete, verifiable steps produces substantially more reliable outputs on multi-part reasoning tasks than equivalent single-prompt designs (Anthropic, Building Effective Agents, 2025).
2. Easier debugging. When a monolithic prompt fails, pinpointing which part of the reasoning went wrong is difficult. In a chain, you can inspect the output at every step. If Step 2 produces an incorrect ranking, you fix Step 2's system prompt without touching Steps 1, 3, or 4. This isolation dramatically reduces the time spent diagnosing LLM failures in production.
3. Modular reuse. A well-scoped "extract key claims" step or "classify intent" step can be reused across multiple chains. In Heym, you can save LLM node configurations and reference them in different workflows. The principle is the same as writing reusable functions instead of duplicating logic across a codebase.
4. Context window efficiency. Modern models support large context windows, up to 200,000 tokens for Claude and up to 128,000 tokens for GPT-4o, but filling those windows with mixed task instructions degrades performance on the later parts of a request. Keeping each call's context focused and small improves output quality and reduces per-step inference cost, especially at high request volumes. Anthropic's long-context documentation explicitly recommends minimizing extraneous content in each call to maintain output quality as context size increases (Anthropic, Long Context Best Practices, 2025).
"Decomposing a task into a sequence of single-responsibility prompts is the most reliable architectural decision you can make when building LLM pipelines for production."
The argument for prompt chaining is not about model capability limitations. It is about maintainability and reliability. Even if a model could handle everything in one call, a chained design would still be easier to test, debug, and iterate on.
The 4 Core Prompt Chaining Patterns
Prompt chains are not all linear sequences. Four foundational patterns cover the majority of production use cases.
Sequential Chain
The simplest pattern. Each step runs after the previous one finishes and passes its full output downstream. Use this when every step depends on the complete result of the step before it.
Example: Content creation pipeline.
Step 1: Extract key facts from source text
Step 2: Rank facts by relevance to the target audience
Step 3: Write a 300-word summary using the top 5 ranked facts
Step 4: Generate a title and meta description from the summaryA two-step sequential chain in Python using the Anthropic SDK is straightforward to write. The key is that claims (the string output of Step 1) is passed directly into Step 2's message, making Step 2 dependent on Step 1 while keeping the two calls fully decoupled in terms of model selection and prompt logic.
import anthropic
client = anthropic.Anthropic()
# Step 1: extract the key claims
step1 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
system=(
"You are a fact extractor. Return the three most important factual claims "
"from the input as a numbered list. Return only the list, no preamble."
),
messages=[{
"role": "user",
"content": f"Extract the key claims:\n\n{source_text}"
}]
)
claims = step1.content[0].text
# Step 2: draft a concise summary from the extracted claims
step2 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=(
"You are a technical writer. Write a concise 200-word summary based only "
"on the provided claims. Do not add information not present in the claims."
),
messages=[{
"role": "user",
"content": f"Write a summary from these claims:\n\n{claims}"
}]
)
summary = step2.content[0].textEach call is independently configurable. You can assign a different model, temperature, or token limit to Step 2 without touching Step 1. This is what makes sequential chains maintainable as requirements evolve.
Branching Chain
A branching chain evaluates a condition on the output of one step and routes to a different downstream prompt based on the result. Think of it as an if/else at the workflow level.
Example: Customer support triage.
Step 1: Classify the support ticket (billing / technical / general)
if "billing" → apply billing-specific resolution prompt
if "technical" → apply technical troubleshooting prompt
if "general" → apply FAQ lookup promptBranching chains are the right choice for LLM routers, content classifiers, and dynamic pipelines where the correct next step depends on what the previous step found. The branching condition is typically a simple string match or JSON field check on the classifier step's output. In Heym, branching is configured visually by drawing edges from one LLM node to multiple downstream nodes and setting a condition expression on each edge. No routing code is needed.
Parallel Chain
In a parallel chain, multiple LLM calls run concurrently on independent subtasks, and a final aggregation step combines the results. Use this when subtasks share no dependencies and you want to minimize total execution time.
Example: Competitive analysis.
Parallel Step A: Summarize competitor 1's product page
Parallel Step B: Summarize competitor 2's product page
Parallel Step C: Summarize competitor 3's product page
Aggregation: Compare all three summaries in a structured tableIf each summary call takes 3 seconds, running the three calls in parallel keeps total chain latency at approximately 3 seconds instead of 9 seconds. In Python this pattern uses asyncio.gather with the async Anthropic client. Notice that the fast, inexpensive claude-haiku-4-5-20251001 model handles the high-volume parallel steps, while the more capable claude-sonnet-4-6 handles the synthesis step where reasoning quality matters most.
import asyncio
import anthropic
async_client = anthropic.AsyncAnthropic()
async def summarize(competitor_text: str, name: str) -> str:
result = await async_client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=256,
system="Summarize the following product description in 80 words.",
messages=[{"role": "user", "content": competitor_text}]
)
return f"**{name}:** {result.content[0].text}"
async def run_parallel_chain(competitors: dict[str, str]) -> str:
# all three summary calls run at the same time
summaries = await asyncio.gather(*[
summarize(text, name) for name, text in competitors.items()
])
combined = "\n\n".join(summaries)
# aggregation step runs after all parallel steps complete
comparison = await async_client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=(
"Compare the following competitor summaries. Produce a markdown table "
"with columns: Competitor, Core Strength, Key Differentiator, Pricing Model."
),
messages=[{"role": "user", "content": combined}]
)
return comparison.content[0].textMatching model capability to task complexity is a practical way to control inference cost at scale. In Heym, you draw edges from a trigger node to multiple LLM nodes and they execute concurrently without any additional configuration.
Iterative Chain
An iterative chain runs a step repeatedly in a loop, checking a quality condition after each iteration to decide whether to continue or exit. This pattern handles refinement tasks where a single generation pass does not reliably reach the desired quality threshold.
Example: Draft refinement loop.
Step 1: Generate an initial draft
Loop (max 4 iterations):
Step 2: Evaluate draft quality on a 1-10 scale with a reasoning paragraph
if score >= 8: exit loop and proceed to Step 3
if score < 8: revise the draft using the evaluation feedback, return to Step 2
Step 3: Final formatting and metadata generationAlways define a hard maximum iteration count before building an iterative chain. Without one, the loop may never exit if the quality threshold is unreachable with the current prompts. Three to five iterations is the practical ceiling for most refinement tasks. Major AI providers including Google document iterative LLM evaluation loops as a standard pattern for quality-controlled generation workflows (Google AI for Developers, 2025). Iterative chains connect naturally to LLM orchestration patterns where a judge model evaluates and a generator model revises, a structure sometimes called a generator-critic loop.
Prompt Chaining vs Chain of Thought
These two terms appear together frequently and are often confused. They are related but describe different things. Both Anthropic and OpenAI document chain of thought as a single-model reasoning technique and prompt chaining as a separate application architecture pattern. This distinction appears explicitly in both teams' 2025 developer guides (OpenAI, Prompt Engineering Guide, 2025).
| Dimension | Prompt Chaining | Chain of Thought (CoT) |
|---|---|---|
| Number of LLM calls | Multiple (one per step) | One |
| Reasoning visibility | Explicit — each step's output is inspectable | Embedded — reasoning lives inside one response |
| Debuggability | High — fix individual steps in isolation | Low — hard to isolate where reasoning fails |
| Context per call | Small and focused | Grows with reasoning length |
| Best use case | Multi-stage tasks with distinct phases | Single-call reasoning: math, logic, code review |
| Billing model | Multiple API calls billed separately | One API call (longer output) |
Chain of thought is a prompting instruction applied within a single LLM call. You tell the model to reason step by step before giving a final answer. This works well for math problems, logic puzzles, and single-question reasoning where you want the model to slow down and not jump directly to a conclusion.
Prompt chaining is an architecture involving multiple separate API calls that form a pipeline. Each call has its own prompt, context, and model configuration. The steps are separate network calls, not reasoning steps within a single response.
You can use both together. A prompt chain where each step uses a chain-of-thought system prompt ("think step by step before answering") gets the structural benefits of chaining and the reasoning quality of CoT within each step. This combination is common in multi-agent AI systems where each specialist agent applies internal CoT while the overall pipeline is a prompt chain coordinated by an orchestration layer.
How to Build a Prompt Chain in Heym
Heym's visual workflow editor lets you build prompt chains without writing API boilerplate. Each LLM node on the canvas represents one step in your chain. Edges between nodes define execution order and pass outputs downstream.
Here is how to build a three-step sequential chain that takes raw article text, extracts key claims, writes a summary, and generates a meta description.
Step 1: Create a new workflow.
Click New Workflow in the Heym dashboard to open the visual editor canvas. A trigger node appears automatically. For this example, use a Text Input trigger so you can run the chain manually by pasting source text into a test form.
Step 2: Add your first LLM node and write a focused system prompt.
Drag an LLM node from the node panel onto the canvas and connect it to the trigger. In the configuration panel on the right, set the system prompt for this extraction step:
You are a fact extractor. Given a piece of text, return a numbered list of the
three most important factual claims. Return only the list, no preamble or summary.In the user prompt field, reference the trigger node's input using Heym's expression syntax:
$trigger.body.textStep 3: Add the second LLM node and connect the two.
Drag a second LLM node onto the canvas. Set its system prompt to define the summarizer's role and output format:
You are a technical writer. Write a concise 150-word summary using only the
provided claims as source material. Do not introduce information not present
in the claims.In the user prompt field, reference the first node's output:
$extractorNode.outputThen draw an edge from the extractor node to the summarizer node by dragging from its output handle to the summarizer's input handle. Heym uses this edge to determine execution order and wire the data.
Step 4: Add the third step for meta description generation.
Add a third LLM node connected to the summarizer. Give it a narrow task: from the summary, write a meta description of exactly 155 characters or fewer that includes the primary keyword in the first 60 characters. A tight system prompt with an explicit character constraint works better than leaving the model to guess the desired length.
Step 5: Run the chain and inspect each step.
Click Run in the top toolbar. Heym executes the chain in sequence and displays the output panel for each node individually. Click any node to see exactly what it received as input and what it produced as output. If the extractor step returns a poorly formatted list, you fix that node's system prompt without modifying the summarizer or meta description nodes at all.
If you enable persistentMemoryEnabled on any LLM node, Heym automatically extracts entity relationships from that node's output and stores them in a graph memory layer using AgentMemoryNode and AgentMemoryEdge records. On subsequent chain runs, that context is injected back into the node's system prompt, letting the chain build on prior executions across runs. This is especially useful for research chains that run on a daily schedule and need to accumulate knowledge over time. For a deeper look at the memory layer, see the AI agent memory guide.
Real-World Prompt Chaining Examples
1. Technical Documentation Generator
Problem: Engineering teams ship features faster than they document them, and documentation written after the fact is often incomplete.
Chain:
- Extract (LLM node, haiku model): Given a pull request diff, identify all functions changed and their signatures.
- Describe (LLM node, sonnet model): For each function, write a plain-English one-paragraph description covering purpose, parameters, and return value.
- Format (LLM node, haiku model): Assemble all descriptions into a Markdown documentation page with a table of contents and a parameter table for each function.
This chain runs on an HTTP webhook trigger connected to a GitHub pull request event. The formatted documentation page is posted back to the PR as a comment, visible to all reviewers before they read the diff. The full setup requires under 20 minutes in Heym.
2. Research Briefing Pipeline
Problem: Analysts spend 3 to 4 hours reading 10 or more source articles to prepare a single briefing document.
Chain:
- Extract (parallel, 10 concurrent LLM calls with haiku model): Retrieve text from each source URL and extract the core argument in 100 words.
- Synthesize (sequential, sonnet model): Combine all 10 summaries, identify recurring themes across sources, and flag contradictions.
- Draft (sequential, sonnet model): Write a 500-word executive briefing from the synthesis, structured as: key finding, supporting evidence, implications, open questions.
Total execution time for this chain is approximately 12 to 15 seconds for 10 sources because the extraction calls run in parallel. A fully sequential approach to the same task would require over 60 seconds. In our benchmarks at Heym, parallel chains processing 10 to 15 independent subtasks consistently complete 70 to 85 percent faster than equivalent sequential pipelines for the same inputs and models.
3. Multi-Stage Content Moderation
Problem: A single binary classifier produces too many false positives because it cannot apply domain-specific policy rules.
Chain:
- Classify (LLM node, haiku model): Detect the content domain: legal, financial, medical, or general.
- Route (branching): Apply a domain-specific policy evaluation prompt based on the classification from Step 1. Each domain has its own evaluation criteria.
- Score (LLM node, sonnet model): Generate a policy violation score from 0 to 100 with a one-paragraph reasoning statement.
- Decide (branching): If score is 70 or above, route to a human review queue with the reasoning statement attached. If score is below 70, auto-approve.
This chain replaces a generic classifier with a branching pipeline that applies domain-specific evaluation at Step 2 and produces a reviewable reasoning trail for every decision. The branching pattern here mirrors the specialist-agent routing used in multi-agent AI systems, where a supervisor routes tasks to agents with the right expertise.
Common Mistakes to Avoid
Limitations to keep in mind: Prompt chaining adds latency proportional to the number of sequential steps: each LLM call must complete before the next begins, so a 5-step chain with 2-second steps takes at least 10 seconds end-to-end. Total API cost scales with step count. Error handling also becomes more complex: a rate limit or timeout at any step fails the entire pipeline unless each transition includes retry logic. For tasks that fit comfortably in a single focused prompt and produce reliable outputs, a chain adds unnecessary overhead.
Passing too much context between steps. If Step 2 receives the full 2,000-word output of Step 1 when it only needs a 50-word list, you are wasting tokens and diluting Step 2's context with irrelevant content. Extract exactly what the downstream step needs before passing it along. A small extraction sub-step costs far less than degraded output quality across many chain runs.
No maximum iteration limit on iterative chains. An iterative chain without a hard iteration cap can run indefinitely if the quality threshold is never met. Always set a maximum of three to five iterations, and define a fallback action for when the cap is reached: log the final best-effort result rather than failing silently. Consistently hitting the cap is a signal that your system prompt or quality threshold needs adjustment.
Using the same model for every step. A fast, inexpensive model such as claude-haiku-4-5-20251001 is appropriate for classification, extraction, and formatting steps. A more capable model is warranted for synthesis and judgment. Assigning models by task type rather than defaulting to one model throughout reduces inference costs without sacrificing quality on the steps where it matters.
Key Takeaways
Prompt chaining decomposes complex LLM tasks into a pipeline of focused, single-responsibility calls. Use sequential chains for ordered workflows, branching for conditional routing, parallel for independent subtasks, and iterative for quality refinement loops. Prompt chaining is not the same as chain of thought. CoT is a single-call reasoning technique while chaining involves multiple API calls. In Heym, you build prompt chains visually by connecting LLM nodes with edges, which eliminates boilerplate, makes each step's output inspectable, and lets you assign the right model to each step without rewriting any coordination logic.
FAQ
What is prompt chaining?
Prompt chaining is a technique where you break a complex task into a sequence of smaller LLM prompts, passing the output of each step as the input to the next. Each call handles one subtask, which improves accuracy, traceability, and control compared to asking a single prompt to do everything at once. The principle maps directly to writing composable functions in software engineering: each function does one thing and returns a value that the caller can inspect and route.
What is the difference between prompt chaining and chain of thought?
Chain of thought is a prompting method applied within a single LLM call. You instruct the model to reason step by step before giving a final answer. Prompt chaining is an architecture involving multiple separate LLM API calls, each with its own prompt, context, and optional model configuration. The two are complementary: you can apply chain-of-thought prompting within each step of a prompt chain to combine both approaches in a single pipeline.
When should I use prompt chaining?
Use prompt chaining when your task has distinct, separable phases such as extraction, classification, generation, and formatting. It is also the right choice when a single prompt produces inconsistent outputs, when you need to validate or route based on intermediate results, or when the full task requires more than 4,000 tokens of reasoning to complete reliably. Tasks with a hard quality threshold such as content moderation or structured document generation benefit from the iterative chaining pattern.
What are the main prompt chaining patterns?
The four core patterns are sequential (steps run in a fixed order, each passing output to the next), branching (output routes to different next steps based on a condition), parallel (multiple LLM calls run concurrently on independent subtasks and results are aggregated), and iterative (a refinement loop that exits when a quality condition is met, with a hard maximum iteration count). Most production chains combine two or more of these patterns.
Can I build a prompt chain without code?
Yes. Heym's visual workflow editor lets you drag LLM nodes onto a canvas, configure each node's system and user prompts through a form panel, connect nodes with edges to define execution order, and run the full chain with one click. Each node's output is visible individually after a run, so you can debug the chain without adding print statements or reading raw API logs. Start building at heym.run.
Build AI workflows without writing code.
Import ready-made AI automations directly into Heym — the source-available workflow platform.

Founding Engineer
Ceren is a founding engineer at Heym, working on AI workflow orchestration and the visual canvas editor. She writes about AI automation, multi-agent systems, and the practitioner experience of building production LLM pipelines.