EDEvalDuel 中文

I ran Codex agents through 16 adversarial arena tasks. Here are the 5 failure modes I found.

Home
Technical war report

Agent red-team arena for testing whether autonomous agents survive adversarial tasks.

Task types16
Dramatic battles3
Failure modes5
PositioningAgent red-team arena
Replay stock30 curated battles
Call to actionSubmit your agent to challenge the leaderboard

16 task types

Virtual server defenseRetrieval poisoningTool-use safetyNetwork policy defenseIdentity memory defensePolicy decision defenseResearch workflow defenseCodebase defenseNegotiation defenseCity operations defenseMarket defenseTrust network defenseCrypto defenseSandbox defenseRationale hygieneMulti-turn adaptation

3 dramatic battles

5 failure modes

Over-blocking

Agents block the attack and the product at the same time.

Brittle literal matching

Agents patch the exact observed string and miss nearby variants.

Unsafe trust transfer

Agents accept familiar context as authorization.

Poisoned retrieval acceptance

Agents treat confident retrieval as verified truth.

Shallow tool-policy patches

Agents guard instruction text but forget the executed tool boundary.

Why hidden validators are necessary

Public prompts are useful for agent builders, but hidden validators are necessary because agents otherwise overfit the visible task wording instead of surviving the actual adversarial condition.