Hacker Manipulates Claude AI To Steal Government Data

A hacker reportedly manipulated Anthropic’s Claude AI to assist in a coordinated cyberattack against Mexican government agencies, exposing how AI tools can be misused in real-world operations.

How the AI Was Jailbroken

Between December 2025 and January 2026, the attacker used repeated Spanish-language prompts to bypass Claude’s safety controls. By presenting the requests as part of a “bug bounty simulation” and asking the AI to role-play as an elite hacker, the threat actor gradually overcame built-in restrictions.

Once guardrails were bypassed, the AI generated detailed technical outputs that supported the attack lifecycle.

The attacker leveraged AI to:

Identify vulnerabilities in legacy government systems
Generate exploit code for SQL injection and network scanning
Assist with credential stuffing techniques
Provide structured, step-by-step attack guidance

When Claude reached usage limits, the operator allegedly pivoted to another AI model to continue planning lateral movement and evasion strategies.

The campaign focused on outdated infrastructure and unpatched web applications. Approximately 20 vulnerabilities were exploited, leading to the theft of nearly 150GB of sensitive data, including taxpayer records, voter information, and government employee credentials.

Security researchers noted that the AI significantly lowered the technical barrier required to execute complex attacks, enabling a single operator to conduct a large-scale campaign without advanced infrastructure.

Anthropic has since banned the related accounts and enhanced monitoring mechanisms to detect misuse. While investigations continue, the incident highlights the growing risk of AI-assisted cybercrime and the urgent need for stronger patch management and AI interaction monitoring across government environments.