FeaturedIntegration#Multi-Agent#Evaluation#Ejentum#Cross-Lab#Self-Eval#MCP

Blind Eval Trio

Three cross-lab agents evaluate any decision blind: steelman defends, stress_test attacks, gap_finder finds what's missing. No synthesizer — you integrate.

Workflow at a glance

The full canvas, before you import it

Click any node to see its config.

#Multi-Agent#Evaluation#Ejentum#Cross-Lab#Self-Eval#MCP

Click a node to select it — same as the Heym editor; the panel shows its settings.

5 nodes · Free & source-available

Blind Eval Trio

Pre-commitment self-evaluation for agent runtimes. Three cross-lab agents (OpenAI, Anthropic, Zhipu) give independent evaluations of any plan or method — without seeing each other's output.

Why this works

Models cannot reliably self-evaluate. Asking the same model to critique its own plan reproduces the original blind spots. The structural fix is cross-lab blind evaluation: three different model labs (different RLHF priors, different training distributions) playing structured adversarial roles, returning three independent perspectives that the calling agent integrates.

Architecture

chatInput
   │
   ├── steelmanAgent   (OpenAI gpt-5-nano   + harness_reasoning)
   ├── stresstestAgent (Anthropic claude-4  + harness_anti_deception)
   └── gapfinderAgent  (Zhipu GLM-4.7      + harness_memory)
   │
   ▼
setFields → { steelman, stress_test, gap_finder, usage_note }

Three agents run in parallel. Each is locked to one role and one Ejentum cognitive harness. No synthesizer agent — the three evaluations are returned raw. The integration tension between voices IS the value.

Roles

  • steelmanAgent — builds the strongest case FOR the submitted method. Pure advocacy, zero smuggled critique.
  • stresstestAgent — finds where the method BREAKS. Failure modes with severity tags, concrete breaking scenarios. Loaded with the Chaos Engineering skill.
  • gapfinderAgent — finds what's MISSING: steps, alternatives, and names three deeper implicit assumptions.

Setup

  1. Get an Ejentum API key at ejentum.com and set it in each agent's MCP env field
  2. Add your OpenAI and Anthropic credentials to the agent nodes
  3. Submit any plan, method, or decision as the input text

Output

{
  "steelman": "...",
  "stress_test": "...",
  "gap_finder": "...",
  "usage_note": "Three independent evaluations, no synthesis. Integrate into your decision; do not score-and-aggregate."
}

Built by Ejentum · agent-teams repo

How to import this template

  1. 1Click Import → Copy JSON on this page.
  2. 2Open your Heym and navigate to a workflow canvas.
  3. 3PressCmd+V/Ctrl+V— nodes appear instantly.
  4. 4Add your API keys in the node config panels and click Run.
More workflow templates
View all templates
Heym
incident analysis · production AI
Observed across 100s of AI rollouts

AI workflows don't fail because of prompts.
They fail because of orchestration.

symptom · glue code01
5 tools
Scripts, vector DB, approval bot, tracing, browser runner — none of them talk.
symptom · visibility02
~0%
Observable behavior across the stack. Debugging is guesswork.
with heym · one runtime
1 canvas
Agents, RAG, HITL, MCP, traces & evals. Self-hosted. Observable.
AI-Native RuntimeProduction-Grade
github.com/heymrun/heym