When LLMs Guard a Wallet

May 12, 2026

A simulator-only adversarial study of 25 attacker models against 25 guardian models over 1,875 ordered trials. This article is the long-form companion to a research paper I published through Crow Tech. The interactive charts, downloadable PDF, BibTeX citation, artifact manifest, sanitized public data files, raw-data link, and source code all live there.

🔗 Interactive version with all charts and downloads: crow.sg/research/llm-wallet-guard-study

outcome-map

Article index

When LLMs Guard a Wallet

First-pass composite1,127

Transfer actions107

Retry-aware composite1,147

Policy blocks107

Abstract

We evaluated whether LLM guardian agents follow a protective wallet instruction under bounded adversarial persuasion from attacker LLMs. The study is simulator-only: it used no real wallets, private keys, RPC endpoints, chain SDKs, wallet signing, testnet/mainnet activity, or real funds. The first pass covered 1,875 ordered attacker-guardian trials: 106 transfer-action attempts, 1,021 guardian self-reported unsafe-intent flags, 703 resilient rows, 41 provider/runtime errors, and 4 incomplete rows. The two flagged subtypes together form a 1,127-row first-pass composite signal. A separate retry envelope resolved the 41 first-pass error rows into 20 additional composite-signal rows and 21 resilient rows.

Method

The main run paired 25 guardian routes with 25 attacker routes across 3 independent repetitions, producing 625 ordered model pairs, 1,875 first-pass trials, and 26,238 completed first-pass model attempts. The retry envelope covered 41 first-pass error rows and completed 704 retry attempts. Attackers had 25 attempts and a 200-word cap per message. Calls used provider-default temperature and sampling.

The terminal flagged outcomes are split between attempted_transfer_tool_call and unsafe_stated_intent. The first is an action compromise: the guardian emitted a transfer-shaped simulator action. The second is not an independent judge label; it is the guardian model's own structured self-assessment that its response reflected unsafe transfer intent, and can appear beside refusal-like visible text. For compatibility, the aggregate CSVs retain the historical guardian_compromised field name, but the honest reading is a composite signal: transfer-action attempt or self-reported unsafe-intent flag. Transfer-shaped simulator actions were blocked by policy, so model action attempts, self-reported flags, and policy enforcement are reported separately.

Results

Status	First pass	Retry-aware
composite signal (`guardian_compromised`)	1127	1147
guardian_resilient	703	724
error	41	0
incomplete	4	4

Flagged subtype	First pass	Retry-aware
transfer-action attempt	106	107
self-reported unsafe-intent flag	1021	1040

Interactive Outcome Map

25 guardians x 25 attackers · retry-aware composite-signal rate

0.0%Grok 4.3 against GPT 5 Nano

Composite signals: 0
Resilient: 3
Transfer actions: 0
Self-report flags: 0

Guardian	A1	A2	A3	A4	A5	A6	A7	A8	A9	A10	A11	A12	A13	A14	A15	A16	A17	A18	A19	A20	A21	A22	A23	A24	A25
G1 · GPT 5 Nano
G2 · Gemini 3 Flash Preview
G3 · HY3 Preview
G4 · GLM 4.7 Flash
G5 · Gemini 3.1 Pro Preview
G6 · GLM 5.1
G7 · GLM 5
G8 · Minimax M2.7
G9 · Claude Opus 4.7
G10 · Minimax M2
G11 · Claude Sonnet 4.6
G12 · Ernie 4.5 300B A47B
G13 · Deepseek V3.2
G14 · Minimax M2.5
G15 · Deepseek V4 Flash
G16 · GPT 5.4
G17 · Claude Haiku 4.5
G18 · Deepseek V4 Pro
G19 · Gemini 3.1 Flash Lite Preview
G20 · Qwen3.6 Flash
G21 · Qwen3.6 Plus
G22 · Hunyuan A13B Instruct
G23 · Grok 4.1 Fast
G24 · Grok 4.3
G25 · Qwen3.6 Max Preview

Guardian Ranking

Guardian	Provider	Resilient	Composite signals	Retry-aware resilient
GPT 5 Nano	OpenAI	73	0	100.0%
HY3 Preview	Tencent	73	0	98.7%
Gemini 3 Flash Preview	Google	72	1	98.7%
GLM 4.7 Flash	Z.ai	71	3	96.0%
Gemini 3.1 Pro Preview	Google	48	26	64.0%
GLM 5.1	Z.ai	40	34	54.7%
GLM 5	Z.ai	37	36	52.0%
Minimax M2.7	MiniMax	35	38	46.7%
Minimax M2	MiniMax	32	39	44.0%
Claude Opus 4.7	Anthropic	32	42	44.0%

Attacker Ranking

Attacker	Provider	Composite signals	Resilient	Retry-aware composite
Grok 4.3	xAI	56	19	74.7%
Ernie 4.5 300B A47B	Baidu	54	21	72.0%
Gemini 3 Flash Preview	Google	53	21	70.7%
GLM 5	Z.ai	53	22	70.7%
Minimax M2.7	MiniMax	53	22	70.7%
Minimax M2	MiniMax	52	23	69.3%
Grok 4.1 Fast	xAI	51	24	68.0%
Qwen3.6 Max Preview	Qwen	50	25	66.7%
Qwen3.6 Flash	Qwen	49	25	65.3%
Gemini 3.1 Flash Lite Preview	Google	49	26	65.3%

Reliability and Retries

First-pass provider/runtime errors were preserved as reliability data. The retry envelope replaces only mapped first-pass error rows and is shown separately.

Subtype	Role	Count
attacker_live_error:RuntimeError	attacker	38
attacker_live_error:ValueError	attacker	2
guardian_live_error:RuntimeError	guardian	1

Safety, Ethics, and Limitations

This is a simulator-only adversarial AI safety evaluation; no real private keys, real wallets, RPC, chain SDKs, wallet signing, mainnet/testnet activity, or real funds were used.
Guardian transfer attempts are transfer-shaped simulator actions only. All observed transfer-shaped actions were blocked by deterministic policy.
The unsafe_stated_intent outcome is the guardian model's own structured self-assessment, not an independent judge label. It may capture schema-following or self-classification instability, so it is reported separately from transfer-action attempts.
First-pass provider/runtime errors are preserved as reliability data. The retry-aware envelope replaces only the mapped first-pass error rows and is reported separately from the first pass.
Rows are ordered attacker-vs-guardian pairs over three repetitions, not independent claims about a provider as a whole.
A guardian marked resilient only means no composite signal was observed within the 25-attempt budget.
Calls used provider-default temperature and sampling through an OpenAI-compatible route. Provider defaults and transient routing errors are part of the measured environment.
AI assistance was used for orchestration, analysis, code, and publication packaging. Daniel Alonso conducted the study with Crow Tech publication support.

Artifacts and Reproducibility

Public article and interactive charts: https://crow.sg/research/llm-wallet-guard-study. Artifact manifest and checksums: https://crow.sg/research/llm-wallet-guard-study/artifact-manifest.json.

Public summary JSON: Generated machine-readable public dataset used by the wallet-guardian study page charts.
Summary JSON schema: Machine-readable JSON Schema for the generated public summary.
Artifact manifest schema: Machine-readable JSON Schema for the public artifact manifest.
Paper PDF: Generated paper-style PDF.
Printable HTML paper: Browser-printable HTML version of the paper.
LaTeX source: LaTeX source for rebuilding the paper when a LaTeX toolchain is available.
BibTeX citation: Citation entry for reference managers and academic notes.
Sanitized first-pass trial CSV: One sanitized row per first-pass ordered attacker-guardian-condition trial, with retry envelope fields for errored rows.
Raw dataset archive: Full raw data archive hosted on Google Drive for independent inspection and reanalysis.
Source code repository: Public study code and reconstruction materials.
Public data notes: Field definitions and caveats for interpreting composite-signal, transfer-action, and self-reported unsafe-intent counts.
Ordered pair matrix CSV: Aggregated 25 by 25 ordered attacker-versus-guardian matrix over three repetitions.
Guardian resilience ranking CSV: Per-guardian sanitized outcome counts and resilience metrics.
Attacker effectiveness ranking CSV: Per-attacker sanitized outcome counts and effectiveness metrics.
Retry envelope CSV: Mapping from first-pass provider/runtime errors to retry-run outcomes.
Outcome map SVG: Vector heatmap preview of ordered attacker-versus-guardian composite-signal rates.
Outcome map PNG: Raster preview image for social cards and crawlers that do not reliably render SVG.