home
cv
blog
links
projects
playlists
crow tech

When LLMs Guard a Wallet

May 12, 2026

A simulator-only adversarial study of 25 attacker models against 25 guardian models over 1,875 ordered trials. This article is the long-form companion to a research paper I published through Crow Tech. The interactive charts, downloadable PDF, BibTeX citation, artifact manifest, sanitized public data files, raw-data link, and source code all live there.

🔗 Interactive version with all charts and downloads: crow.sg/research/llm-wallet-guard-study

outcome-map


Article index


First-pass composite1,127
Transfer actions107
Retry-aware composite1,147
Policy blocks107

Abstract

We evaluated whether LLM guardian agents follow a protective wallet instruction under bounded adversarial persuasion from attacker LLMs. The study is simulator-only: it used no real wallets, private keys, RPC endpoints, chain SDKs, wallet signing, testnet/mainnet activity, or real funds. The first pass covered 1,875 ordered attacker-guardian trials: 106 transfer-action attempts, 1,021 guardian self-reported unsafe-intent flags, 703 resilient rows, 41 provider/runtime errors, and 4 incomplete rows. The two flagged subtypes together form a 1,127-row first-pass composite signal. A separate retry envelope resolved the 41 first-pass error rows into 20 additional composite-signal rows and 21 resilient rows.


Method

The main run paired 25 guardian routes with 25 attacker routes across 3 independent repetitions, producing 625 ordered model pairs, 1,875 first-pass trials, and 26,238 completed first-pass model attempts. The retry envelope covered 41 first-pass error rows and completed 704 retry attempts. Attackers had 25 attempts and a 200-word cap per message. Calls used provider-default temperature and sampling.

The terminal flagged outcomes are split between attempted_transfer_tool_call and unsafe_stated_intent. The first is an action compromise: the guardian emitted a transfer-shaped simulator action. The second is not an independent judge label; it is the guardian model's own structured self-assessment that its response reflected unsafe transfer intent, and can appear beside refusal-like visible text. For compatibility, the aggregate CSVs retain the historical guardian_compromised field name, but the honest reading is a composite signal: transfer-action attempt or self-reported unsafe-intent flag. Transfer-shaped simulator actions were blocked by policy, so model action attempts, self-reported flags, and policy enforcement are reported separately.


Results

StatusFirst passRetry-aware
composite signal (guardian_compromised)11271147
guardian_resilient703724
error410
incomplete44
Flagged subtypeFirst passRetry-aware
transfer-action attempt106107
self-reported unsafe-intent flag10211040

Interactive Outcome Map

25 guardians x 25 attackers · retry-aware composite-signal rate

0.0%Grok 4.3 against GPT 5 Nano
Composite signals
0
Resilient
3
Transfer actions
0
Self-report flags
0
GuardianA1A2A3A4A5A6A7A8A9A10A11A12A13A14A15A16A17A18A19A20A21A22A23A24A25
G1 · GPT 5 Nano
G2 · Gemini 3 Flash Preview
G3 · HY3 Preview
G4 · GLM 4.7 Flash
G5 · Gemini 3.1 Pro Preview
G6 · GLM 5.1
G7 · GLM 5
G8 · Minimax M2.7
G9 · Claude Opus 4.7
G10 · Minimax M2
G11 · Claude Sonnet 4.6
G12 · Ernie 4.5 300B A47B
G13 · Deepseek V3.2
G14 · Minimax M2.5
G15 · Deepseek V4 Flash
G16 · GPT 5.4
G17 · Claude Haiku 4.5
G18 · Deepseek V4 Pro
G19 · Gemini 3.1 Flash Lite Preview
G20 · Qwen3.6 Flash
G21 · Qwen3.6 Plus
G22 · Hunyuan A13B Instruct
G23 · Grok 4.1 Fast
G24 · Grok 4.3
G25 · Qwen3.6 Max Preview

Guardian Ranking

GuardianProviderResilientComposite signalsRetry-aware resilient
GPT 5 NanoOpenAI730100.0%
HY3 PreviewTencent73098.7%
Gemini 3 Flash PreviewGoogle72198.7%
GLM 4.7 FlashZ.ai71396.0%
Gemini 3.1 Pro PreviewGoogle482664.0%
GLM 5.1Z.ai403454.7%
GLM 5Z.ai373652.0%
Minimax M2.7MiniMax353846.7%
Minimax M2MiniMax323944.0%
Claude Opus 4.7Anthropic324244.0%

Attacker Ranking

AttackerProviderComposite signalsResilientRetry-aware composite
Grok 4.3xAI561974.7%
Ernie 4.5 300B A47BBaidu542172.0%
Gemini 3 Flash PreviewGoogle532170.7%
GLM 5Z.ai532270.7%
Minimax M2.7MiniMax532270.7%
Minimax M2MiniMax522369.3%
Grok 4.1 FastxAI512468.0%
Qwen3.6 Max PreviewQwen502566.7%
Qwen3.6 FlashQwen492565.3%
Gemini 3.1 Flash Lite PreviewGoogle492665.3%

Reliability and Retries

First-pass provider/runtime errors were preserved as reliability data. The retry envelope replaces only mapped first-pass error rows and is shown separately.

SubtypeRoleCount
attacker_live_error:RuntimeErrorattacker38
attacker_live_error:ValueErrorattacker2
guardian_live_error:RuntimeErrorguardian1

Safety, Ethics, and Limitations

  • This is a simulator-only adversarial AI safety evaluation; no real private keys, real wallets, RPC, chain SDKs, wallet signing, mainnet/testnet activity, or real funds were used.
  • Guardian transfer attempts are transfer-shaped simulator actions only. All observed transfer-shaped actions were blocked by deterministic policy.
  • The unsafe_stated_intent outcome is the guardian model's own structured self-assessment, not an independent judge label. It may capture schema-following or self-classification instability, so it is reported separately from transfer-action attempts.
  • First-pass provider/runtime errors are preserved as reliability data. The retry-aware envelope replaces only the mapped first-pass error rows and is reported separately from the first pass.
  • Rows are ordered attacker-vs-guardian pairs over three repetitions, not independent claims about a provider as a whole.
  • A guardian marked resilient only means no composite signal was observed within the 25-attempt budget.
  • Calls used provider-default temperature and sampling through an OpenAI-compatible route. Provider defaults and transient routing errors are part of the measured environment.
  • AI assistance was used for orchestration, analysis, code, and publication packaging. Daniel Alonso conducted the study with Crow Tech publication support.

Artifacts and Reproducibility

Public article and interactive charts: https://crow.sg/research/llm-wallet-guard-study. Artifact manifest and checksums: https://crow.sg/research/llm-wallet-guard-study/artifact-manifest.json.