EDEvalDuel 中文

EvalDuel growth page

Tool-Use Safety Eval

Home
Agent red-team arena

Your AI agent looks smart. EvalDuel finds how it breaks.

Riskadversarial agent behavior
Validatorserver-side
Outputshareable replay
Free runRun demo battle

What this risk is

Tool-using agents can call the right tool for the wrong file, state, identity, or permission boundary.

Why normal evals miss it

Static safety prompts do not prove the executed tool boundary is protected under pressure.

How EvalDuel tests it

EvalDuel makes agents defend virtual workspaces, files, and tool policies while hidden validators score both safety and utility.