With countries like the US trying to crank up Internet supervision, understanding how China does it may help people figure out what is to be expected elsewhere, said Jedidiah R. Crandall, an assistant professor at the Department of Computer Science, University of New Mexico.
Certain phrases can never be posted on Weibo, according to regulations, while others may only have a lifespan of a couple of hours or even minutes, while seemingly harmless words may take on new meanings and therefore attract extra scrutiny.
Just exactly how the monitoring mechanism works remains a mystery. Users do not know for certain how far they can go when writing about a certain sensitive topic before it will be taken down. But researchers from different backgrounds, from journalism to computer sciences, have been working to unveil the monitoring process by analyzing the vast data collected using Weibo's open platform.
Study in deletion
A recent study by an independent researcher and four computer scientists in the US showed that censoring occurs within a matter of minutes. Published online earlier this month, their paper, The Velocity of Censorship, is one of many attempts to understand how monitoring works in China's microblogging services.
With over 100 million posts every day, it's difficult for scientists to follow the speed of deletions as quickly as they occur. Therefore scientists spent two weeks identifying "sensitive users," accounts who have been censored in the past and are likely to be more closely monitored.
Through APIs, application programming interface on Sina Weibo's open platform, researchers were able to access user timelines and automatically gather the most recent posts every minute, allowing them to detect deletions within 60 seconds.
Scientists could detect two types of deletion: when the post was completely gone, either deleted by its creator or by Weibo staff; or when staff blocked the post from being seen by other users, generating a "permission denied" error, but the post remained visible to its creator.
Since there is no way to distinguish between self-deletion and deletion by Sina, most of the current research on Weibo censorship has focused on the "permission denied" deletion.
Between July 20 and September 8, 2012, the team gathered 2.38 million posts from the pool of 3,567 sensitive users. They found that 12.5 percent of these posts were deleted, among which the permission-denied deletions accounted for 33 percent, says Dan Wallach, a professor at Rice University's department of computer science who co-authored the paper.
They observed that about 30 percent of deletions happened within 30 minutes, with over 90 percent of deletions coming within a day of submissions.
Wallach said he was "impressed with how quickly they operate." In fact, 5 percent of deletion happens in the first eight minutes.
The main challenges the computer scientists faced were detecting the censorship every minute and analyzing the Chinese language messages, said co-author Crandall.
While previous studies on censorship in China's social media mainly described the censorship and focused on the censored content, most of them were only able to measure the deletions hours or days after they took place, said Crandall.
They also observed in the paper that it would take "1,400 people working at the same time" to read every one of the "70,000 new posts in one minute." "If these workers only worked in eight-hour shifts, 4,200 workers would be required," reads the paper.
While Sina has not disclosed how many employees it has working on Weibo, public records and media reports show the company of 5,000 or so employees has no fewer than 1,200 people devoted to its microblog service. The company has been expanding in the past few years, with most new recruits going to work in Weibo operations. Still, the number 4,200 stated in the US research report sounds unlikely.
Based on their data, Wallach and his collaborators were able to propose several hypotheses as to how Weibo's filtering mechanisms operate. They hypothesize that there are different lists of keywords that each triggers a different kind of monitoring behavior.
They believe that after a sensitive post is detected, most of its reposts are also deleted within minutes. Sometimes after a topic or keyword becomes sensitive, workers do a keyword search to delete posts retroactively. Search filters are also set up almost instantly to ban searches for certain words after they become sensitive.
They also found that specific users are targeted as their sensitive posts are deleted more rapidly than others' and that the deletion rate drops in the early hours of the morning.
"China is, in many respects, leading the world in the deployment of Internet censorship technologies, with many other countries looking to follow China's lead," said Wallach. He said that their work is motivated by trying to understand how well automated censorship technology can operate in practice.
Since microblogging began in China in late 2009, authorities and service providers have taken measures to keep up with its growing popularity and with increasingly liberal attitudes on social and political issues coming through in comments.
Even when posts vanish quickly, chances are that many of them will still have been read by thousands of users, if not millions.
In September 2011, Charles Chao, CEO of Sina, told an industry forum that censorship was being stepped up to quash rumors and false news. This came about two months after the deadly bullet train crash in Wenzhou on July 23 which killed at least 40 people. Weibo saw a lot of chatter immediately after the accident with many people questioning the official handling of the aftermath.
In December 2011, Beijing passed a regulation requiring Weibo users to use their real names when registering. This real-name registration policy was enforced across all microblog platforms including Sina, Sohu and Netease in March 2012.
In May 2012, Sina Weibo rolled out a credit system encouraging users to report on false information, abuses and harmful content. Observers saw this as yet another attempt at censorship while Chao maintained the system was intended to keep order on Weibo.
While the new platform has given people greater freedom of expression, it has also become a hotbed for meaningless wars of words and rumors. For instance, after the earthquake and radiation leaks in Japan in March 2011, rumors on Weibo exaggerated the situation, which led to a panic purchase of salt in many cities, among other problems.
The government is worried foreign forces may use the Internet for subversion. After the former US ambassador to China, Jon Huntsman's comments in November 2011, saying that the US should reach out to the Chinese Internet generation to bring about changes that "take China down," there was heated debate among the elite about what role the Internet would play in China's stability.
Fu King-wa is an assistant professor at the Journalism and Media Studies Center at the University of Hong Kong. He and his colleagues have developed a program named WeiboScope that detects deleted posts and tracks hot topics. They published an article in IEEE Internet Computing journal last week, assessing Weibo censorship using discriminatory keyword analysis and also trying to determine the effect of the real-name registration policy.
"Internet censorship is a unique socio-technical phenomenon in China, which is a result of complex interplays between technology, institutional practices, social norms, and human behaviors," said Fu.
Millions of posts
Fu's research focuses on analyzing keywords that discriminate censored and uncensored posts written by the same users. He and his colleagues collected 111 million posts between January 1 and June 30, 2012.
Their analysis shows that among the top 30 keywords during that period, some were related to very sensitive incidents such as former Chongqing Party chief Bo Xilai and Chen Guangcheng, a blind self-educated lawyer and advocate, while others are less evident such as the annual two sessions and US Ambassador Gary Locke.
Fu said he finds it very interesting to see variations on themes that Web users have created to avoid censorship, such as using "tomato" to refer to Chongqing or "head nurse" to refer to Bo's right-hand man Wang Lijun. These are often rhymes or puns based on Chinese meanings and have proven to have a high survival rate as they are often successful at avoiding the detection of censors.
Many research projects rely on different algorithms and computational methodology to decipher the ever-expanding list of sensitive keywords on Weibo.
Crandall has been part of a team of scientists working on the ConceptDoppler project since 2007, which monitors Internet censorship around the world.
Crandall says research on censorship in countries like China is relevant because "Internet censorship is a global trend" and it can "give us a window into the future of what Internet censorship will look like around the world in five to 10 years."
The US has been trying to extend copyright law into the online realm but many users and websites have protested against such a bill concerned that it will pose a threat to Internet freedom.
Many scholars are speculating the actual impact of Weibo but empirical data is lacking to support this theory, said Fu, adding that he is keen to explore this in future research.