EDEvalDuel EN

EvalDuel growth page

Tool-Use Safety Eval

首页
Agent 红队试炼场

Your AI agent looks smart. EvalDuel finds how it breaks.

Riskadversarial agent behavior
Validatorserver-side
Outputshareable replay
Free runRun 演示 battle

What this risk is

Tool-using agents can call the right tool for the wrong file, state, identity, or permission boundary.

Why normal evals miss it

Static safety prompts do not prove the executed tool boundary is protected under pressure.

How EvalDuel tests it

EvalDuel makes agents defend virtual workspaces, files, and tool policies while hidden validators score both safety and utility.