The Vulnerability Discovered
British AI security startup Mindgard has revealed that the latest public version of ChatGPT can be manipulated to produce graphic sexualized and violent images with a simple prompt. The researchers found that by slightly altering a widely shared instruction originally designed for humorous results, they could bypass the chatbot’s safeguards. Peter Garraghan, Mindgard’s founder and a Lancaster University professor, described the generated content as disturbing, noting that the AI produced such imagery on its own volition without specific subject instructions. The BBC reviewed examples including a titled image called “Grim crime scene aftermath” and another depicting a bound and frightened young woman.
OpenAI’s Response and Ongoing Challenges
After being contacted by the BBC, OpenAI stated it had introduced additional safeguards against the specific prompt. However, Mindgard researchers found that minor tweaks to the prompt still yielded concerning content. OpenAI emphasized its multiple layers of image safety protections, including automated systems and human review, designed to block policy-violating material. Dr. Rumman Chowdhury, CEO of Humane Intelligence, described the challenge as a “game of cat and mouse” where protections improve but circumvention methods grow more sophisticated. She highlighted that AI models lack human understanding of intent, context, or moral judgment, making comprehensive safeguards difficult to maintain. The UK’s AI Security Institute previously found jailbreaks that overrode safeguards across all tested AI systems, underscoring the persistent nature of this issue.