How to Build an AI Customer Support Agent in Heym

Q: What is an AI customer support agent?

An AI customer support agent is a workflow that reads an incoming customer message, retrieves the relevant passages from your own help documentation, and writes a grounded reply with a large language model. Unlike a generic chatbot, it answers from a knowledge base instead of guessing, routes anything it cannot answer to a human, and remembers the conversation across turns. In Heym it is built by connecting nodes on a visual canvas: a RAG node for retrieval, an LLM or Agent node for the reply, a Condition node for escalation, and an Agent node with Human-in-the-Loop for sensitive responses.

Q: How does RAG stop an AI support agent from hallucinating?

RAG (retrieval-augmented generation) inserts the exact passages from your documentation into the model's prompt before it answers. The LLM is instructed to answer only from the supplied context, so it quotes your refund policy or setup steps rather than inventing them. In Heym's RAG Q&A Agent template, the RagSearch node returns the top 5 matching chunks from a Qdrant collection and the answer node references `$RagSearch.context`. If the context does not contain the answer, you instruct the model to say so and escalate, which is what keeps wrong answers out of customer-facing replies.

Q: Can the AI agent escalate to a human when it is unsure?

Yes. There are two escalation patterns in Heym. The first is a Condition node that checks the model's own confidence or category output and routes low-confidence or high-risk tickets to a Slack channel or a human queue. The second is Human-in-the-Loop on an Agent node: the agent drafts a reply, calls `request_human_review`, and pauses at a public `/review/{token}` URL until a teammate accepts, edits, or refuses the draft. Refund requests, legal-sensitive responses, and policy exceptions are typical triggers for the human review path.

Q: Do I need coding experience to build a customer support agent in Heym?

No. Every step runs on the visual canvas. You ingest documents into a knowledge base with the RAG Document Ingest template, configure retrieval by setting a collection name and a top-K value, write the agent's instructions in a plain-text field, and set escalation rules with a single expression in a Condition node. There is no framework to install, no vector database to host yourself beyond connecting a credential, and no deployment pipeline.

Q: How does the agent remember a multi-turn conversation?

Enable `persistentMemory` on the Agent node. With persistent memory on, the agent stores the running conversation against a session key so a customer can ask a follow-up question ("and what about annual plans?") and the agent keeps the earlier context. This is what separates a real support agent from a single-shot Q&A endpoint that forgets every message. Memory is scoped per conversation, so different customers never see each other's history.

Q: What channels can the support agent receive tickets from?

Heym ships triggers for several intake channels. The IMAP Support Inbox Triage template watches a shared mailbox and polls every 5 minutes. You can also start the workflow from a Webhook (for a website chat widget or a help-desk integration), or from the Telegram, Discord, and Slack nodes for community support. The same downstream logic (retrieve, answer, escalate) works regardless of where the message arrives.

TL;DR: Build an AI customer support agent in Heym by composing four templates on the visual canvas. Use RAG Document Ingest to load your help docs into a knowledge base, RAG Q&A Agent to answer questions grounded in those docs, IMAP Support Inbox Triage to pull tickets from a shared mailbox, and HITL Support Reply Agent to pause for human approval on sensitive replies. The result answers from your own content instead of guessing, escalates with a single Condition expression, and never sends a refund or policy reply without a human in the loop. No code.

Definition: An AI customer support agent is an automated workflow that reads a customer message, retrieves the relevant passages from your own documentation, and writes a grounded answer with a large language model, escalating to a human whenever it is unsure or the request is sensitive. It differs from a basic chatbot by answering from a knowledge base, remembering the conversation, and having an explicit human handoff path.

Key Takeaways:

A reliable support agent has four parts: a grounded answer (RAG), conversation memory (persistentMemory), an escalation rule (Condition node), and a human review path (HITL)

The RAG Q&A Agent template retrieves the top 5 chunks from a Qdrant collection and answers from $RagSearch.context, which is what stops the agent from inventing policies

A Condition node with one expression (for example $Classify.json.confidence < 0.6) is the gate that keeps low-confidence and high-risk replies away from customers

The HITL Support Reply Agent drafts a reply, calls request_human_review, and pauses at a public /review/{token} URL until a teammate accepts, edits, or refuses it

Intake is channel-agnostic: a shared mailbox (IMAP), a website chat widget (Webhook), or Telegram, Discord, and Slack all feed the same retrieve, answer, escalate logic

Every step is built by connecting nodes on a canvas, with no framework to install and no server to manage

What Is an AI Customer Support Agent?
Why Most Support Bots Fail
How the AI Customer Support Agent Works in Heym
Step 1: Build the Knowledge Base
Step 2: Ground Every Answer with RAG
Step 3: Remember the Conversation
Step 4: Triage the Intake Channel
Step 5: Route Escalations
Step 6: Put a Human in the Loop
Measuring and Improving the Agent
Real-World Use Cases
FAQ

What Is an AI Customer Support Agent?

Ceren Kaya Akgün is a software engineer at Heym who builds and maintains AI workflow automation pipelines. Everything in this guide reflects features available in Heym today.

Heym is a visual AI workflow automation platform. Workflows are built by connecting nodes on a canvas with no code required. Support and operations teams use Heym to combine LLMs, retrieval, APIs, and messaging tools into pipelines that run on their own infrastructure.

An AI customer support agent is the use of a large language model to read each incoming customer message, retrieve the relevant passages from your documentation, and write a grounded reply, while escalating anything it cannot answer confidently to a human.

"An AI customer support agent answers from your own knowledge base rather than guessing, remembers the conversation across turns, and hands off to a human the moment it is unsure or the request is sensitive."

The distinction that matters is grounding. A plain chatbot generates an answer from the model's training data, which produces confident-sounding but wrong replies about your specific refund window or setup steps. A support agent retrieves your actual policy text first, then answers from it. If you are weighing the difference between the two patterns in depth, the AI agent vs chatbot comparison covers where each one fits.

Why Most Support Bots Fail

The two failure modes that erode customer trust are hallucinated answers and the dead end. A hallucinated answer is the bot stating a 14-day refund window when your policy says 30. The dead end is the bot looping a frustrated customer through canned responses with no way to reach a person.

Customers notice both. Zendesk's CX Trends 2026 research found that 95% of customers want to know why an AI made the decision it did, yet only 37% of companies currently offer any reasoning behind their AI's answers (Zendesk CX Trends 2026). That gap is exactly what grounding and human handoff close: a grounded agent can cite the passage it answered from, and an agent with an escalation path never traps anyone.

The business case for getting it right is also clear. In Zendesk's 2025 CX Trends report, companies leading on AI adoption were 128% more likely to report high return on their AI investment than laggards (Zendesk 2025 CX Trends Report). The teams seeing that return are not the ones shipping an ungrounded bot. They are the ones whose agent answers from real content and escalates cleanly.

A well-built support agent is therefore defined by what it refuses to do: it refuses to answer when the knowledge base has no match, and it refuses to send a sensitive reply without a human review. That refusal is itself an LLM guardrail: the human-in-the-loop layer that stops a model from acting on a high-stakes case unchecked. The rest of this guide builds exactly that, one node at a time.

The table below is the difference between the two, dimension by dimension:

Dimension	Basic chatbot	Grounded AI support agent
Source of answer	Model training data, so it guesses	Your own docs, retrieved with RAG
Hallucination risk	High on your specific policies	Low; answers from retrieved passages
Unknown questions	Invents an answer or dead-ends	Says it is escalating, routes to a human
Conversation memory	None or single-turn	Persistent per session (`persistentMemory`)
Sensitive replies (refunds, policy)	Sent automatically	Paused for human approval (HITL)
Staying current	Retrained or hard-coded	Re-run ingest when the docs change

How the AI Customer Support Agent Works in Heym

The full agent is four composable pieces, each available as a ready-to-clone template:

Knowledge base (RAG Document Ingest): your help docs, chunked and embedded into a vector collection.
Grounded answer (RAG Q&A Agent): retrieve the top matching chunks, answer from them.
Intake and triage (IMAP Support Inbox Triage): watch a shared mailbox, summarize, flag urgency.
Human handoff (HITL Support Reply Agent): draft a reply and pause for approval on sensitive cases.

AI customer support agent: end-to-end flow

Intake

Ticket arrives

Email, chat widget, or Slack

RAG search

Retrieve

Top-5 chunks from your docs

Agent

Grounded answer

Answers from context + memory

Condition

Escalation gate

Confidence + category check

routes to

Auto

Send reply

Grounded answer goes to the customer

HITL

Human review

Pause at the review URL for approval

The knowledge base is built once with RAG Document Ingest. Every ticket then flows through retrieve, answer, and escalate.

You do not have to build all four on day one. The minimum viable agent is the knowledge base plus the grounded answer. Triage and human review are the steps that make it production-safe. The sections below follow that order.

Each piece starts from a ready-made template, and cloning one onto your canvas takes about a minute. This short walkthrough shows the import flow you will use for all four:

Step 1: Build the Knowledge Base

Open the RAG Document Ingest template. Point it at the content your agent should answer from: help center articles, product documentation, onboarding guides, and policy pages. Run the template once per document.

Ingest does three things in sequence. It splits each document into chunks small enough for retrieval to be precise, it converts each chunk into an embedding vector, and it writes those vectors to your vector store collection. The store can be backed by Qdrant or by Postgres with the pgvector extension; the steps below are identical either way. The single setting to record here is the collection name, because the answer step in the next section must point at the same collection.

Keep two practices in mind. Use one embedding model consistently, since the ingest step and the query step must use the same model for the vectors to be comparable. And re-run ingest whenever your documentation changes, so the agent never answers from a stale policy. If you want the mechanics of chunking, embeddings, and retrieval in depth, the how to build a RAG pipeline guide walks through each decision.

Step 2: Ground Every Answer with RAG

Open the RAG Q&A Agent template. It wires four nodes in a straight line: UserQuestion to RagSearch to AnswerLLM to Answer.

Set the RagSearch node's collectionName to the collection you built during ingest, and leave topK at 5 so it returns the five most relevant chunks. The AnswerLLM node already references $RagSearch.context in its system instruction, which is the variable that injects the retrieved passages into the prompt.

Make one addition to the answer instruction:

If the retrieved context does not contain the answer, do not guess. Reply that you are escalating to a human teammate, and stop.

That single rule is the difference between a grounded agent and a confident liar. With it, the model answers from your refund policy when the policy is retrieved, and escalates when it is not. This is the core of RAG-based customer support, and it is also why retrieval beats fine-tuning for support content that changes often, a tradeoff covered in RAG vs fine-tuning.

Step 3: Remember the Conversation

A single question and answer is not a support conversation. Customers ask follow-ups: "and does that apply to the annual plan?" To handle that, replace the plain LLM answer node with an Agent node and toggle persistentMemory.

With persistent memory on, the agent stores the running conversation against a session key. The follow-up question arrives with the earlier context intact, so the agent resolves "that" to the refund policy it just explained. Memory is scoped per conversation, so two customers never see each other's history.

Keep the RAG search step in front of the agent on every turn. Memory holds the dialog; retrieval keeps each answer grounded. The two are complementary, not interchangeable. For the architectural options behind agent memory, including short-term versus persistent stores, see the AI agent memory guide.

Step 4: Triage the Intake Channel

Answers are only half the system. The other half is intake. Open the IMAP Support Inbox Triage template to turn a shared mailbox into a structured front door.

The template polls an IMAP inbox every 5 minutes with the SupportInbox trigger. Each new message flows to a SummariseEmail LLM node that writes a one-sentence summary and an urgency signal, then to a NeedsEscalation Condition node that posts urgent items to a Slack channel and logs routine tickets as a normal result. Connect an IMAP Email Inbox credential, point the Slack node at your support channel, and the front door is live.

Intake is channel-agnostic. The same downstream logic works whether tickets arrive by email, through a Webhook from a website chat widget, or via the Telegram and Discord nodes for community support. Pick the trigger that matches where your customers already are, and feed it into the grounded answer step from Step 2.

Step 5: Route Escalations

Escalation is a single Condition node, and it is the most important node in the workflow. Place it after the answer or classification step and check a structured field the model returns.

A reliable pattern is to have an LLM node classify the ticket first with jsonOutputEnabled set to true, returning a small JSON object such as { "category": "refund", "confidence": 0.42 }. The Condition node then branches on an expression:

$Classify.json.confidence < 0.6

Tickets below the confidence threshold branch to a human queue. You can add category rules for sensitive topics that should always reach a person regardless of confidence:

$Classify.json.category == 'refund'

Everything that clears both checks continues to an automatic, grounded reply. Tune the 0.6 threshold to your tolerance: raise it to send more tickets to humans, lower it to automate more aggressively. This explicit gate is what lets you automate the easy 70% of tickets while guaranteeing the hard or risky 30% reaches a person.

Step 6: Put a Human in the Loop

For replies that carry real consequence, a confidence threshold is not enough. You want a person to read the exact words before they reach the customer. Open the HITL Support Reply Agent template. For the wider question of which actions deserve a gate at all, and what a paused run costs you, see human in the loop for AI agents.

The DraftReplyAgent node has hitlEnabled set to true. It drafts a customer-ready response, calls request_human_review with the customer issue, the proposed reply, a risk level, and the next action, then posts the pending review URL and the Markdown draft to a Slack channel through the NotifyReviewSlack node. Execution pauses there. A teammate opens the public /review/{token} link and accepts, edits, or refuses the draft. On approval, the agent returns the final reviewed reply; on refusal, it logs an internal note and sends nothing.

This is the pattern for refunds, escalations, legal-sensitive responses, and any policy exception. The agent does the drafting work, and a human keeps accountability for what actually goes out. One configuration note from the template: keep JSON output disabled on the Agent node while HITL is enabled, since the human review flow uses text-mode output.

To make any downstream channel node agent-callable, drag a connection from the Agent node's tools handle to a Slack, Telegram, or HTTP node and mark the relevant field with the bot icon. The agent then fills that field at runtime while the channel and credential stay fixed.

The two escalation patterns are not interchangeable. Use this comparison to decide which one a given ticket needs:

	Condition-node escalation (Step 5)	Human-in-the-loop approval (Step 6)
What it gates	Where the ticket is routed	The exact words sent to the customer
Trigger	`confidence < 0.6` or a category rule	Agent calls `request_human_review`
Human effort	Picks the ticket up from a queue or Slack	Accepts, edits, or refuses a drafted reply
Best for	Volume routing and urgency triage	Refunds, policy exceptions, legal-sensitive replies
After handoff	Agent stops; a person takes over	Agent pauses, then resumes with the approved text

A production support agent usually runs both: the Condition node automates the easy majority, and HITL guards the few replies that carry real consequence.

Editing the Canvas Without Breaking the Flow

As your support agent grows from two nodes to a dozen, you will want to restructure without rewiring everything. Heym has two affordances for that. Hover over any edge and click the + to drop a new node directly onto an existing connection with an insertable edge, so you can add a classification step between retrieval and answer without deleting the link. And right-click any node for the context menu, where Extract to Sub-Workflow collapses a tested sub-graph (your whole escalation branch, for example) into a single callable node, Duplicate lets you A/B two answer prompts side by side, and Disable skips a step during testing without removing it.

The same context menu exposes Share as Template, so once your support agent is tuned for your product, you can publish it for the rest of your team to clone.

Measuring and Improving the Agent

A support agent is not done when it sends its first reply. It is done when you can see what it is doing and prove it is getting better. Two practices close that loop.

First, observe every run. Each execution in Heym is inspectable in the Debug Panel, so you can read what the retrieval step returned and what the agent did with it. For production monitoring across many runs, the AI agent observability guide covers tracing the retrieve-answer-escalate path end to end.

Second, evaluate answer quality on a fixed set of real questions. Build a small test set of tickets with known-good answers and re-run it whenever you change a prompt, a threshold, or the knowledge base. The AI agent evaluation guide explains how to score grounded answers for accuracy and catch regressions before customers do. Watching the escalation rate over time is the fastest health signal: a rate that climbs usually means a documentation gap to fill with another ingest run.

Real-World Use Cases

SaaS help-desk deflection. Ingest your help center, ground answers with the RAG Q&A Agent, and put a Condition node in front of any billing or cancellation request. Routine "how do I reset my password" tickets get an instant grounded answer; billing tickets route to a human. Teams deflect the repetitive volume while keeping a person on the cases that affect revenue.

Shared inbox triage for small teams. A three-person support team points the IMAP Support Inbox Triage template at support@ their domain. The agent summarizes each message and flags urgency to Slack, so the team triages a full inbox in the time it used to take to read it. The grounded answer step then drafts replies for the routine half.

Refund and policy review with HITL. An e-commerce team routes every refund request through the HITL Support Reply Agent. The agent drafts an empathetic, policy-consistent reply, and a manager approves or edits it at the review URL before it sends. The team gets the speed of AI drafting with full human accountability on money-related responses.

Internal IT and HR help desk. Point the knowledge base at your internal wiki and policy documents, and the same architecture answers employee questions about VPN setup or leave policy, escalating anything sensitive to the right team. The pattern is identical; only the documents change.

You can start today by cloning the RAG Document Ingest and RAG Q&A Agent templates for the grounded answer core, then adding IMAP Support Inbox Triage and HITL Support Reply Agent for intake and human review. Each takes only a credential and a few field edits to configure.

For the broader foundations behind this build, the how to build an AI agent guide covers agents with memory and tools, AI agent use cases shows where else this pattern applies, and the multi-agent AI systems guide explains how to split a complex support flow across coordinated agents.

FAQ

What is an AI customer support agent?

An AI customer support agent is a workflow that reads an incoming customer message, retrieves the relevant passages from your own help documentation, and writes a grounded reply with a large language model. Unlike a generic chatbot, it answers from a knowledge base instead of guessing, routes anything it cannot answer to a human, and remembers the conversation across turns. In Heym it is built by connecting nodes on a canvas: a RAG node for retrieval, an Agent node for the reply, a Condition node for escalation, and an Agent node with Human-in-the-Loop for sensitive responses.

How does RAG stop an AI support agent from hallucinating?

RAG inserts the exact passages from your documentation into the model's prompt before it answers. The model is instructed to answer only from the supplied context, so it quotes your refund policy rather than inventing one. In Heym's RAG Q&A Agent template, the RagSearch node returns the top 5 matching chunks from a Qdrant collection and the answer node references $RagSearch.context. If the context does not contain the answer, you instruct the model to say so and escalate, which keeps wrong answers out of customer-facing replies.

Can the AI agent escalate to a human when it is unsure?

Yes, with two patterns. A Condition node checks the model's confidence or category output and routes low-confidence or high-risk tickets to a Slack channel or a human queue. Human-in-the-Loop on an Agent node goes further: the agent drafts a reply, calls request_human_review, and pauses at a public /review/{token} URL until a teammate accepts, edits, or refuses the draft. Refunds, legal-sensitive responses, and policy exceptions are typical triggers for the human review path.

Do I need coding experience to build a customer support agent in Heym?

No. Every step runs on the visual canvas. You ingest documents with the RAG Document Ingest template, configure retrieval by setting a collection name and a top-K value, write the agent's instructions in a plain-text field, and set escalation rules with a single expression in a Condition node. There is no framework to install and no deployment pipeline to manage.

How does the agent remember a multi-turn conversation?

Enable persistentMemory on the Agent node. The agent then stores the running conversation against a session key, so a customer can ask a follow-up question and the agent keeps the earlier context. Memory is scoped per conversation, so different customers never share history. This is what separates a real support agent from a single-shot Q&A endpoint that forgets every message.

What channels can the support agent receive tickets from?

The IMAP Support Inbox Triage template watches a shared mailbox and polls every 5 minutes. You can also start the workflow from a Webhook for a website chat widget, or from the Telegram, Discord, and Slack nodes for community support. The same retrieve, answer, and escalate logic runs regardless of where the message arrives.

Steps at a glance

Build the knowledge base with the RAG Document Ingest template. Open the RAG Document Ingest template in Heym. Point it at your help center articles, product docs, or policy pages, and run it once per document. The template chunks each document, creates embeddings, and writes the vectors to a Qdrant collection. Note the collection name because the Q&A step must use the same one. Re-run ingest whenever your documentation changes so the agent answers from current content.
Ground answers with the RAG Q&A Agent template. Open the RAG Q&A Agent template. It wires four nodes: UserQuestion, RagSearch, AnswerLLM, and Answer. Set the RagSearch node's `collectionName` to the collection you created during ingest and keep `topK` at 5 to start. The AnswerLLM system instruction already references `$RagSearch.context`. Add one rule to that instruction: if the context does not contain the answer, reply that you are escalating to a human instead of guessing.
Turn on persistent memory for multi-turn conversations. If you want the agent to handle follow-up questions, replace the plain LLM answer node with an Agent node and toggle `persistentMemory`. The agent now keeps the running conversation per session, so a customer can ask a clarifying question without repeating context. Keep the same RAG search step in front of the agent so every turn is still grounded in your documentation.
Add intake triage with the IMAP Support Inbox Triage template. Open the IMAP Support Inbox Triage template to connect a shared mailbox. Create an IMAP Email Inbox credential, and the SupportInbox trigger polls every 5 minutes. The SummariseEmail LLM node writes a one-sentence summary and an urgency signal, and the NeedsEscalation Condition node posts urgent items to a Slack channel while logging routine tickets. This becomes the front door that feeds your grounded answer step.
Route escalations with a Condition node. Add a Condition node after the answer step. Check a structured field the model returns, for example a category or a confidence value, with an expression like `$Classify.json.confidence < 0.6` or `$Classify.json.category == 'refund'`. Low-confidence or high-risk tickets branch to a human queue or a Slack alert; everything else continues to an automatic reply. This is the single most important node for keeping wrong or sensitive answers away from customers.
Add human approval with the HITL Support Reply Agent template. For replies that need a manager's sign-off, open the HITL Support Reply Agent template. The DraftReplyAgent node has `hitlEnabled` set to true: it drafts a customer-ready response, calls `request_human_review`, and posts the pending review URL and Markdown draft to a Slack channel. Execution pauses until a teammate opens the `/review/{token}` link and accepts, edits, or refuses the draft. On approval, the agent returns the final reviewed reply. Keep JSON output disabled on the Agent node while HITL is enabled.

How to Build an AI Customer Support Agent in Heym

Table of Contents

What Is an AI Customer Support Agent?

Why Most Support Bots Fail

How the AI Customer Support Agent Works in Heym

Step 1: Build the Knowledge Base

Step 2: Ground Every Answer with RAG

Step 3: Remember the Conversation

Step 4: Triage the Intake Channel

Step 5: Route Escalations

Step 6: Put a Human in the Loop

Editing the Canvas Without Breaking the Flow

Measuring and Improving the Agent

Real-World Use Cases

FAQ

What is an AI customer support agent?

How does RAG stop an AI support agent from hallucinating?

Can the AI agent escalate to a human when it is unsure?

Do I need coding experience to build a customer support agent in Heym?

How does the agent remember a multi-turn conversation?

What channels can the support agent receive tickets from?

Steps at a glance

A chatbot is not
a workflow system.

The argument

What breaks first

What heym gives you

Enjoyed this post? Get the next one in your inbox.

Table of Contents

What Is an AI Customer Support Agent?

Why Most Support Bots Fail

How the AI Customer Support Agent Works in Heym

Step 1: Build the Knowledge Base

Step 2: Ground Every Answer with RAG

Step 3: Remember the Conversation

Step 4: Triage the Intake Channel

Step 5: Route Escalations

Step 6: Put a Human in the Loop

Editing the Canvas Without Breaking the Flow

Measuring and Improving the Agent

Real-World Use Cases

FAQ

What is an AI customer support agent?

How does RAG stop an AI support agent from hallucinating?

Can the AI agent escalate to a human when it is unsure?

Do I need coding experience to build a customer support agent in Heym?

How does the agent remember a multi-turn conversation?

What channels can the support agent receive tickets from?

Steps at a glance

A chatbot is nota workflow system.

The argument

What breaks first

What heym gives you

Enjoyed this post? Get the next one in your inbox.

A chatbot is not
a workflow system.