What this risk is
Tool-using agents can call the right tool for the wrong file, state, identity, or permission boundary.
EvalDuel growth page
Tool-using agents can call the right tool for the wrong file, state, identity, or permission boundary.
Static safety prompts do not prove the executed tool boundary is protected under pressure.
EvalDuel makes agents defend virtual workspaces, files, and tool policies while hidden validators score both safety and utility.