Back to blog

June 12, 2026Ceren Kaya Akgün

How to Build an AI Customer Support Agent in Heym

Build an AI customer support agent that answers from your own docs, triages a shared inbox, and pauses for human approval on sensitive replies. No code.

ai-customer-support-agentai-customer-support-automationrag-customer-supportai-support-knowledge-baseai-helpdesk-automationhuman-in-the-loopno-code
How to Build an AI Customer Support Agent in Heym

TL;DR: Build an AI customer support agent in Heym by composing four templates on the visual canvas. Use RAG Document Ingest to load your help docs into a knowledge base, RAG Q&A Agent to answer questions grounded in those docs, IMAP Support Inbox Triage to pull tickets from a shared mailbox, and HITL Support Reply Agent to pause for human approval on sensitive replies. The result answers from your own content instead of guessing, escalates with a single Condition expression, and never sends a refund or policy reply without a human in the loop. No code.

Definition: An AI customer support agent is an automated workflow that reads a customer message, retrieves the relevant passages from your own documentation, and writes a grounded answer with a large language model, escalating to a human whenever it is unsure or the request is sensitive. It differs from a basic chatbot by answering from a knowledge base, remembering the conversation, and having an explicit human handoff path.

Key Takeaways:

  • A reliable support agent has four parts: a grounded answer (RAG), conversation memory (persistentMemory), an escalation rule (Condition node), and a human review path (HITL)
  • The RAG Q&A Agent template retrieves the top 5 chunks from a Qdrant collection and answers from $RagSearch.context, which is what stops the agent from inventing policies
  • A Condition node with one expression (for example $Classify.json.confidence < 0.6) is the gate that keeps low-confidence and high-risk replies away from customers
  • The HITL Support Reply Agent drafts a reply, calls request_human_review, and pauses at a public /review/{token} URL until a teammate accepts, edits, or refuses it
  • Intake is channel-agnostic: a shared mailbox (IMAP), a website chat widget (Webhook), or Telegram, Discord, and Slack all feed the same retrieve, answer, escalate logic
  • Every step is built by connecting nodes on a canvas, with no framework to install and no server to manage

Table of Contents


What Is an AI Customer Support Agent?

Ceren Kaya Akgün is a software engineer at Heym who builds and maintains AI workflow automation pipelines. Everything in this guide reflects features available in Heym today.

Heym is a visual AI workflow automation platform. Workflows are built by connecting nodes on a canvas with no code required. Support and operations teams use Heym to combine LLMs, retrieval, APIs, and messaging tools into pipelines that run on their own infrastructure.

An AI customer support agent is the use of a large language model to read each incoming customer message, retrieve the relevant passages from your documentation, and write a grounded reply, while escalating anything it cannot answer confidently to a human.

"An AI customer support agent answers from your own knowledge base rather than guessing, remembers the conversation across turns, and hands off to a human the moment it is unsure or the request is sensitive."

The distinction that matters is grounding. A plain chatbot generates an answer from the model's training data, which produces confident-sounding but wrong replies about your specific refund window or setup steps. A support agent retrieves your actual policy text first, then answers from it. If you are weighing the difference between the two patterns in depth, the AI agent vs chatbot comparison covers where each one fits.

Why Most Support Bots Fail

The two failure modes that erode customer trust are hallucinated answers and the dead end. A hallucinated answer is the bot stating a 14-day refund window when your policy says 30. The dead end is the bot looping a frustrated customer through canned responses with no way to reach a person.

Customers notice both. Zendesk's CX Trends 2026 research found that 95% of customers want to know why an AI made the decision it did, yet only 37% of companies currently offer any reasoning behind their AI's answers (Zendesk CX Trends 2026). That gap is exactly what grounding and human handoff close: a grounded agent can cite the passage it answered from, and an agent with an escalation path never traps anyone.

The business case for getting it right is also clear. In Zendesk's 2025 CX Trends report, companies leading on AI adoption were 128% more likely to report high return on their AI investment than laggards (Zendesk 2025 CX Trends Report). The teams seeing that return are not the ones shipping an ungrounded bot. They are the ones whose agent answers from real content and escalates cleanly.

A well-built support agent is therefore defined by what it refuses to do: it refuses to answer when the knowledge base has no match, and it refuses to send a sensitive reply without a human review. The rest of this guide builds exactly that, one node at a time.

The table below is the difference between the two, dimension by dimension:

DimensionBasic chatbotGrounded AI support agent
Source of answerModel training data, so it guessesYour own docs, retrieved with RAG
Hallucination riskHigh on your specific policiesLow; answers from retrieved passages
Unknown questionsInvents an answer or dead-endsSays it is escalating, routes to a human
Conversation memoryNone or single-turnPersistent per session (persistentMemory)
Sensitive replies (refunds, policy)Sent automaticallyPaused for human approval (HITL)
Staying currentRetrained or hard-codedRe-run ingest when the docs change

How the AI Customer Support Agent Works in Heym

The full agent is four composable pieces, each available as a ready-to-clone template:

  1. Knowledge base (RAG Document Ingest): your help docs, chunked and embedded into a vector collection.
  2. Grounded answer (RAG Q&A Agent): retrieve the top matching chunks, answer from them.
  3. Intake and triage (IMAP Support Inbox Triage): watch a shared mailbox, summarize, flag urgency.
  4. Human handoff (HITL Support Reply Agent): draft a reply and pause for approval on sensitive cases.
AI customer support agent: end-to-end flow
Intake
Ticket arrives
Email, chat widget, or Slack
RAG search
Retrieve
Top-5 chunks from your docs
Agent
Grounded answer
Answers from context + memory
Condition
Escalation gate
Confidence + category check
routes to
Auto
Send reply
Grounded answer goes to the customer
HITL
Human review
Pause at the review URL for approval
The knowledge base is built once with RAG Document Ingest. Every ticket then flows through retrieve, answer, and escalate.

You do not have to build all four on day one. The minimum viable agent is the knowledge base plus the grounded answer. Triage and human review are the steps that make it production-safe. The sections below follow that order.

Each piece starts from a ready-made template, and cloning one onto your canvas takes about a minute. This short walkthrough shows the import flow you will use for all four:

Step 1: Build the Knowledge Base

Open the RAG Document Ingest template. Point it at the content your agent should answer from: help center articles, product documentation, onboarding guides, and policy pages. Run the template once per document.

Ingest does three things in sequence. It splits each document into chunks small enough for retrieval to be precise, it converts each chunk into an embedding vector, and it writes those vectors to a Qdrant collection. The single setting to record here is the collection name, because the answer step in the next section must point at the same collection.

Keep two practices in mind. Use one embedding model consistently, since the ingest step and the query step must use the same model for the vectors to be comparable. And re-run ingest whenever your documentation changes, so the agent never answers from a stale policy. If you want the mechanics of chunking, embeddings, and retrieval in depth, the how to build a RAG pipeline guide walks through each decision.

Step 2: Ground Every Answer with RAG

Open the RAG Q&A Agent template. It wires four nodes in a straight line: UserQuestion to RagSearch to AnswerLLM to Answer.

Set the RagSearch node's collectionName to the collection you built during ingest, and leave topK at 5 so it returns the five most relevant chunks. The AnswerLLM node already references $RagSearch.context in its system instruction, which is the variable that injects the retrieved passages into the prompt.

Make one addition to the answer instruction:

If the retrieved context does not contain the answer, do not guess. Reply that you are escalating to a human teammate, and stop.

That single rule is the difference between a grounded agent and a confident liar. With it, the model answers from your refund policy when the policy is retrieved, and escalates when it is not. This is the core of RAG-based customer support, and it is also why retrieval beats fine-tuning for support content that changes often, a tradeoff covered in RAG vs fine-tuning.

Step 3: Remember the Conversation

A single question and answer is not a support conversation. Customers ask follow-ups: "and does that apply to the annual plan?" To handle that, replace the plain LLM answer node with an Agent node and toggle persistentMemory.

With persistent memory on, the agent stores the running conversation against a session key. The follow-up question arrives with the earlier context intact, so the agent resolves "that" to the refund policy it just explained. Memory is scoped per conversation, so two customers never see each other's history.

Keep the RAG search step in front of the agent on every turn. Memory holds the dialog; retrieval keeps each answer grounded. The two are complementary, not interchangeable. For the architectural options behind agent memory, including short-term versus persistent stores, see the AI agent memory guide.

Step 4: Triage the Intake Channel

Answers are only half the system. The other half is intake. Open the IMAP Support Inbox Triage template to turn a shared mailbox into a structured front door.

The template polls an IMAP inbox every 5 minutes with the SupportInbox trigger. Each new message flows to a SummariseEmail LLM node that writes a one-sentence summary and an urgency signal, then to a NeedsEscalation Condition node that posts urgent items to a Slack channel and logs routine tickets as a normal result. Connect an IMAP Email Inbox credential, point the Slack node at your support channel, and the front door is live.

Intake is channel-agnostic. The same downstream logic works whether tickets arrive by email, through a Webhook from a website chat widget, or via the Telegram and Discord nodes for community support. Pick the trigger that matches where your customers already are, and feed it into the grounded answer step from Step 2.

Step 5: Route Escalations

Escalation is a single Condition node, and it is the most important node in the workflow. Place it after the answer or classification step and check a structured field the model returns.

A reliable pattern is to have an LLM node classify the ticket first with jsonOutputEnabled set to true, returning a small JSON object such as { "category": "refund", "confidence": 0.42 }. The Condition node then branches on an expression:

$Classify.json.confidence < 0.6

Tickets below the confidence threshold branch to a human queue. You can add category rules for sensitive topics that should always reach a person regardless of confidence:

$Classify.json.category == 'refund'

Everything that clears both checks continues to an automatic, grounded reply. Tune the 0.6 threshold to your tolerance: raise it to send more tickets to humans, lower it to automate more aggressively. This explicit gate is what lets you automate the easy 70% of tickets while guaranteeing the hard or risky 30% reaches a person.

Step 6: Put a Human in the Loop

For replies that carry real consequence, a confidence threshold is not enough. You want a person to read the exact words before they reach the customer. Open the HITL Support Reply Agent template.

The DraftReplyAgent node has hitlEnabled set to true. It drafts a customer-ready response, calls request_human_review with the customer issue, the proposed reply, a risk level, and the next action, then posts the pending review URL and the Markdown draft to a Slack channel through the NotifyReviewSlack node. Execution pauses there. A teammate opens the public /review/{token} link and accepts, edits, or refuses the draft. On approval, the agent returns the final reviewed reply; on refusal, it logs an internal note and sends nothing.

This is the pattern for refunds, escalations, legal-sensitive responses, and any policy exception. The agent does the drafting work, and a human keeps accountability for what actually goes out. One configuration note from the template: keep JSON output disabled on the Agent node while HITL is enabled, since the human review flow uses text-mode output.

To make any downstream channel node agent-callable, drag a connection from the Agent node's tools handle to a Slack, Telegram, or HTTP node and mark the relevant field with the bot icon. The agent then fills that field at runtime while the channel and credential stay fixed.

The two escalation patterns are not interchangeable. Use this comparison to decide which one a given ticket needs:

Condition-node escalation (Step 5)Human-in-the-loop approval (Step 6)
What it gatesWhere the ticket is routedThe exact words sent to the customer
Triggerconfidence < 0.6 or a category ruleAgent calls request_human_review
Human effortPicks the ticket up from a queue or SlackAccepts, edits, or refuses a drafted reply
Best forVolume routing and urgency triageRefunds, policy exceptions, legal-sensitive replies
After handoffAgent stops; a person takes overAgent pauses, then resumes with the approved text

A production support agent usually runs both: the Condition node automates the easy majority, and HITL guards the few replies that carry real consequence.

Editing the Canvas Without Breaking the Flow

As your support agent grows from two nodes to a dozen, you will want to restructure without rewiring everything. Heym has two affordances for that. Hover over any edge and click the + to drop a new node directly onto an existing connection with an insertable edge, so you can add a classification step between retrieval and answer without deleting the link. And right-click any node for the context menu, where Extract to Sub-Workflow collapses a tested sub-graph (your whole escalation branch, for example) into a single callable node, Duplicate lets you A/B two answer prompts side by side, and Disable skips a step during testing without removing it.

The same context menu exposes Share as Template, so once your support agent is tuned for your product, you can publish it for the rest of your team to clone.

Measuring and Improving the Agent

A support agent is not done when it sends its first reply. It is done when you can see what it is doing and prove it is getting better. Two practices close that loop.

First, observe every run. Each execution in Heym is inspectable in the Debug Panel, so you can read what the retrieval step returned and what the agent did with it. For production monitoring across many runs, the AI agent observability guide covers tracing the retrieve-answer-escalate path end to end.

Second, evaluate answer quality on a fixed set of real questions. Build a small test set of tickets with known-good answers and re-run it whenever you change a prompt, a threshold, or the knowledge base. The AI agent evaluation guide explains how to score grounded answers for accuracy and catch regressions before customers do. Watching the escalation rate over time is the fastest health signal: a rate that climbs usually means a documentation gap to fill with another ingest run.

Real-World Use Cases

SaaS help-desk deflection. Ingest your help center, ground answers with the RAG Q&A Agent, and put a Condition node in front of any billing or cancellation request. Routine "how do I reset my password" tickets get an instant grounded answer; billing tickets route to a human. Teams deflect the repetitive volume while keeping a person on the cases that affect revenue.

Shared inbox triage for small teams. A three-person support team points the IMAP Support Inbox Triage template at support@ their domain. The agent summarizes each message and flags urgency to Slack, so the team triages a full inbox in the time it used to take to read it. The grounded answer step then drafts replies for the routine half.

Refund and policy review with HITL. An e-commerce team routes every refund request through the HITL Support Reply Agent. The agent drafts an empathetic, policy-consistent reply, and a manager approves or edits it at the review URL before it sends. The team gets the speed of AI drafting with full human accountability on money-related responses.

Internal IT and HR help desk. Point the knowledge base at your internal wiki and policy documents, and the same architecture answers employee questions about VPN setup or leave policy, escalating anything sensitive to the right team. The pattern is identical; only the documents change.

You can start today by cloning the RAG Document Ingest and RAG Q&A Agent templates for the grounded answer core, then adding IMAP Support Inbox Triage and HITL Support Reply Agent for intake and human review. Each takes only a credential and a few field edits to configure.

For the broader foundations behind this build, the how to build an AI agent guide covers agents with memory and tools, AI agent use cases shows where else this pattern applies, and the multi-agent AI systems guide explains how to split a complex support flow across coordinated agents.


FAQ

What is an AI customer support agent?

An AI customer support agent is a workflow that reads an incoming customer message, retrieves the relevant passages from your own help documentation, and writes a grounded reply with a large language model. Unlike a generic chatbot, it answers from a knowledge base instead of guessing, routes anything it cannot answer to a human, and remembers the conversation across turns. In Heym it is built by connecting nodes on a canvas: a RAG node for retrieval, an Agent node for the reply, a Condition node for escalation, and an Agent node with Human-in-the-Loop for sensitive responses.

How does RAG stop an AI support agent from hallucinating?

RAG inserts the exact passages from your documentation into the model's prompt before it answers. The model is instructed to answer only from the supplied context, so it quotes your refund policy rather than inventing one. In Heym's RAG Q&A Agent template, the RagSearch node returns the top 5 matching chunks from a Qdrant collection and the answer node references $RagSearch.context. If the context does not contain the answer, you instruct the model to say so and escalate, which keeps wrong answers out of customer-facing replies.

Can the AI agent escalate to a human when it is unsure?

Yes, with two patterns. A Condition node checks the model's confidence or category output and routes low-confidence or high-risk tickets to a Slack channel or a human queue. Human-in-the-Loop on an Agent node goes further: the agent drafts a reply, calls request_human_review, and pauses at a public /review/{token} URL until a teammate accepts, edits, or refuses the draft. Refunds, legal-sensitive responses, and policy exceptions are typical triggers for the human review path.

Do I need coding experience to build a customer support agent in Heym?

No. Every step runs on the visual canvas. You ingest documents with the RAG Document Ingest template, configure retrieval by setting a collection name and a top-K value, write the agent's instructions in a plain-text field, and set escalation rules with a single expression in a Condition node. There is no framework to install and no deployment pipeline to manage.

How does the agent remember a multi-turn conversation?

Enable persistentMemory on the Agent node. The agent then stores the running conversation against a session key, so a customer can ask a follow-up question and the agent keeps the earlier context. Memory is scoped per conversation, so different customers never share history. This is what separates a real support agent from a single-shot Q&A endpoint that forgets every message.

What channels can the support agent receive tickets from?

The IMAP Support Inbox Triage template watches a shared mailbox and polls every 5 minutes. You can also start the workflow from a Webhook for a website chat widget, or from the Telegram, Discord, and Slack nodes for community support. The same retrieve, answer, and escalate logic runs regardless of where the message arrives.

Vol. 01On AI Infrastructure
Self-hosted · Source Available
Heym
An opinion, plainly stated
— on what production AI actually needs

A chatbot is not
a workflow system.

The argument

Wrapping an LLM in a nice UI solves a demo. It does not solve production. The moment an AI step has operational consequences, you need retrieval, approvals, retries, traces, and evals — in one runtime you actually control.

What breaks first

× silent failures
× no audit trail
× untestable prompts
× glue code sprawl

What heym gives you

agents & RAG
HITL approvals
traces & evals
self-hosted
Ceren Kaya Akgün
Ceren Kaya Akgün

Founding Engineer

Ceren is a founding engineer at Heym, working on AI workflow orchestration and the visual canvas editor. She writes about AI automation, multi-agent systems, and the practitioner experience of building production LLM pipelines.

Enjoyed this post? Get the next one in your inbox.

A monthly note with practical ideas for building AI workflows that hold up in production. No noise, and you can unsubscribe anytime.

No spam, no marketing fluff