EDEvalDuel EN

EvalDuel growth page

RAG Poisoning Test

首页
Agent 红队试炼场

Your AI agent looks smart. EvalDuel finds how it breaks.

Riskadversarial agent behavior
Validatorserver-side
Outputshareable replay
Free runRun 演示 battle

What this risk is

RAG poisoning makes low-trust retrieved text look authoritative enough for the agent to reuse as policy.

Why normal evals miss it

Retrieval benchmarks usually measure answer relevance, not whether source authority is separated from usefulness.

How EvalDuel tests it

EvalDuel checks whether an agent isolates poisoned documents while preserving useful retrieval behavior.