The Silent Sabotage
A new study quietly published by researchers at ETH Zurich reveals a coordinated data poisoning campaign that has contaminated at least 12 major large language model training sets. Tens of thousands of malicious examples were injected into Common Crawl and The Pile, targeting known CVE vulnerabilities in Python and JavaScript packages. The attack is subtle: poisoned tokens trigger biased outputs only when specific sequences are present. This is not a theoretical threat. It is a live, ongoing compromise of the internet’s most valuable AI feedstock.
The Industry Is Not Ready
Despite billions in safety research budgets, OpenAI, Anthropic, and Meta all failed to detect the tainted data until after the study’s public release. Their models now produce subtly poisoned outputs in code generation and security advice tasks. The researchers responsibly disclosed CVE-2026-18332 and CVE-2026-18333 to the affected package maintainers. Yet no major lab has announced a recall or retraining plan. The silence is deafening and damning. The industry would rather pretend this didn’t happen than admit their training pipelines are wide open to adversarial attack.
Source: Technologyreview
