SOURCE / ECONOMY
China’s Ministry of Public Security warns of cyber data pollution by questionable AI training
Published: Aug 05, 2025 10:12 AM
AI Photo: VCG

AI Photo: VCG


China's Ministry of Public Security (MPS) on Tuesday issued a safety advisory, warning that artificial intelligence (AI) training data varies widely in quality and often contains false information, fabricated content, and biased views.

The three core elements of AI are algorithms, computing power, and data. Among them, data is the fundamental element for training AI models and a key resource for AI applications. It provides the raw material for AI models, influences AI performance, and drives AI usage, said the MPS in an article published on its official WeChat account.

High-quality data can significantly enhance the accuracy and reliability of AI models. However, polluted data may lead to faulty decisions or even system failures, posing serious safety risks, it noted.

Studies show that even a tiny amount of false text in training data can sharply increase harmful output. For instance, just 0.001 percent of false text can raise harmful output by 7.2 percent, and at 0.01 percent, the increase reaches 11.2 percent, said the article.

False content generated by polluted data can be re-used in future training, creating a lasting "pollution legacy effect." AI-generated content now far exceeds human-created content in volume, and the prevalence of low-quality, biased data will result in compounding errors in training, ultimately distorting a model's understanding over time, it said.

Data pollution can lead to real-world risks, the ministry warned. In finance, it may trigger abnormal market fluctuations, while in public safety, it can mislead public opinion and spark panic, and in healthcare, it may result in incorrect diagnoses, endanger lives, and promote pseudoscience.

To enhance oversight and prevent data pollution at the source, China has implemented a classification and grading system for AI data, based on legislation such as the Cybersecurity Law, Data Security Law, and the Law on Protection of Personal Information. 

The goal is to curb the generation of polluted data at the source and mitigate AI-related data security risks. Authorities are enhancing risk assessments, improving safeguards for data flow, and implementing end-point correction mechanisms within a structured framework, noted the article.

Global Times