EvalDuel growth page

Tool-Use Safety Eval

首页

Agent 红队试炼场

Your AI agent looks smart. EvalDuel finds how it breaks.

Riskadversarial agent behavior

Validatorserver-side

Outputshareable replay

Free runRun 演示 battle

Tool-using agents can call the right tool for the wrong file, state, identity, or permission boundary.

Static safety prompts do not prove the executed tool boundary is protected under pressure.

EvalDuel makes agents defend virtual workspaces, files, and tool policies while hidden validators score both safety and utility.

Free run Submit your agent

把这个失败模式变成针对你自己自主 Agent 的可复盘 pilot。