As first reported by CNBC, researchers are raising alarms over the growing number of harmful and problematic responses generated by AI models, ranging from hate speech to copyright violations and explicit content. The rapid adoption of AI across industries is revealing gaps in testing and oversight, with experts warning that current evaluation methods are not sufficient to safeguard users. “After almost 15 years of research, we still don’t know how to make models behave reliably,” said adversarial machine learning researcher Javier Rando.
Red teaming — a practice borrowed from cybersecurity that involves deliberately probing AI systems for vulnerabilities — has emerged as a vital method for stress-testing models. However, researchers like Shayne Longpre note that the current red-teaming ecosystem is under-resourced. In a recent paper, Longpre and collaborators argue for expanding testing beyond internal teams to include third-party experts such as scientists, doctors, lawyers, and journalists. They also propose standardized AI flaw reporting and reward structures to better document and address model weaknesses.
One initiative, Project Moonshot, offers a promising path forward. Developed in Singapore with support from IBM and DataRobot, the open-source toolkit combines benchmarking, red teaming, and customizable evaluation mechanisms. IBM’s Anup Kumar emphasized that evaluation must be a continuous effort, and while some startups have adopted Moonshot, broader industry engagement remains limited. Future improvements aim to make the tool more adaptable across languages, cultures, and industries.
Experts are also calling for regulation in AI to follow the precedents set by sectors like pharmaceuticals and aviation, where rigorous testing is mandatory before release. Pierre Alquier of ESSEC Business School argued that tech companies are releasing general-purpose models too quickly without understanding the full scope of their potential misuse. Narrower, task-specific models could help mitigate these risks, but for now, developers must avoid overstating the strength of their model safeguards.
The AI industry is at a critical juncture: as models grow in power and ubiquity, their potential for harm escalates just as rapidly. Without proper standards, open testing frameworks, and clear regulatory oversight, both users and developers are left vulnerable. Researchers say that establishing stronger checks — through red teaming, transparency, and policy — is not just a safeguard but a necessary foundation for trustworthy AI.
