I'm fortunately funded by the Knight-Mozilla OpenNews Fellowship program to attend a conference on China and the New Internet World organized by the Oxford Internet Institute. There I will give a presentation on China's news censorship. I've uploaded the full paper and the slides online, please feel free to download them for more information. Also, I have more data and preliminary findings unpublished and I'd love to share and discuss them. My email address is songyan at msu dot edu.
Prior and Ongoing Research on Internet Censorship
Internet censorship has been attracting much attention from various academics and institutes. For example, the Open Net Initiative (ONI) has been constantly testing the availability of websites in 74 countries and rating government control of content related to politics, social issues, Internet tools, and conflict/security (Palfrey, 2010). The Open Internet Tool Project (OpenITP) surveyed circumvention tool users living in China to understand how they bypass the Great Firewall in hopes of building better tools to serve the needs of internet users in China and other censored regimes (Robinson et al., 2013).
Among the empirical studies focused on online media, Bamman et al.’s (2012) work claimed to be “the first large–scale analysis of political content censorship” that investigates messages deleted from Sina Weibo, a Chinese equivalent to Twitter. They found 16.25% of posts were deleted after their publication time and recognized some characteristics related to post deletions, including 295 sensitive keywords and the outlying provinces such as Tibet and Qinghai. Beyond Sina Weibo and on an even larger scale, King et al. (2013) collected data from nearly 1,400 Chinese social media platforms and analyzed the deleted messages with the aid of linguistic software. In contrast to previous presumptions that its harsh criticism of the government is the target of censors, King et al. found that indeed it's ongoing and potential collective activities that the state aims to prevent and suppress.
Research Methods in a Nutshell
To our best knowledge, however, censorial practices in online news media have never been studied, not to mention extensively investigated through computing approaches. Therefore, our study may be the first empirical attempt that systematically examined the news articles deleted from the Chinese cyberspace.
We developed scripts to collect news articles published on NetEase and Sina, two major news aggregators headquartered in China. Meanwhile we continuously checked whether or not these articles remained available and we marked a news article as deleted once its link was found broken. In fact, to make sure that the news story was really deleted due to its content rather than editorial or technical reasons, we searched across the websites for the articles with the same title but under a different link. Only when duplicates were unavailable did we claim that a particular story was deleted.
After collecting thousands of deleted news stories, we ran a regression over these data to detect patterns associated with deletion. The technique we adopted is ReLogit (King and Zeng, 2001a and 2001b), a logistic regression handling rare events data. This tool was developed by political scientists to analyze rare events, such as wars and coups. For this reason, this is an appropriate tool for our study because the over deletion rates across the two websites were under 1%, as summarized below.
Findings and Conclusions
During the course of our study, on each website, about two articles were deleted per day and the overall deletion rate was 0.05% on NetEase and 0.13% on Sina Beijing.
Several similar patterns have been found across the two news portals:
- Domestic news had a significantly higher chance of being deleted than international news: twice as likely for NetEase, and about six times for Sina Beijing.
- News covering Beijing had twice the chance for deletion compared to news covering other places in China.
- Tibet as a subject matter had little relation with deletion.
- National, compared to local, news was significantly associated with deletion for both websites: For NetEase, one and a half times as likely to be deleted, and for Sina Beijing one third times as likely to be deleted.
- Nature of events was another strong indicator. Compared to neutral stories, for NetEase, positive news had one third the chance to be deleted whereas negative news nearly four times, and for Sina Beijing, negative news had three times to be deleted.
- Five out of 13 coded news topics were strongly associated with news deletions, including politics, business, foreign affairs, food and drugs, and military, although the strengths varied across the categories and the websites.
From this evidence, we reached the following conclusions:
- The two Chinese news portals deleted news with similar patterns.
- These similarities are translated to the practice of systematic control, the quintessential component of the definition of censorship (Peleg, 1993).
- Hence, for the first time, we have confirmed and quantified the online news censorship in China.
Taboo Words
Beyond news deletion, I've been examining comment deletions as well. I've created some word clouds with the help of Wordle and highlighted the keywords most commonly found in deleted comments. They're not included in the paper or the slides.
These keywords are aligned with our general understanding of taboo topics, such as land acquisition, death toll, social unrest, food safety, pollution, and lamentable work environment.
Comments Prohibited and Suppressed
A second research topic of mine is how comments are manipulated and what patterns are associated with the manipulation. Various types of manipulation have been observed and they include having commenting function disabled, screening and filtering submitted comments before publication (i.e., pre-censorship), and deleting published comments after publication (i.e., post-censorship). This topic isn't included in the paper or the slides.
To make this research topic more understandable, I'll first elaborate on the general practice of Chinese news portals. Most of the time, news portals welcome and encourage comments because interactions boost web traffic. However, a small portion of news stories have their commenting feature disabled. There are two way to implement this function. On NetEase, a notification is put under a story, informing "commenting is disabled" and the button for commenting is unavailable. Sina takes a more subtle approach and puts no such a notification and meanwhile users can submit comments as usual but the comments are never displayed on the website. These are pre-censorship techniques. As to post-censorship, both websites simply remove comments quietly after their publication. A third type of manipulative technique is different from passively pre- or post-censoring comments, but to proactively hire Internet commentators, or so-called 50 Cent Party, to propagate orthodox ideas endorsed by the government.
The following time-series chart demonstrates the first type of comment manipulation, which is to prohibit comments. In this way, party organs attempt to impose official opinions through one-way communication on issues on North Korea, outlying provinces, controversial territories, major criminal case, and so on.
More subtly, Sina "allows" comments but never shows some of them on the website. I've figured out how to send parameters to the API to request the numbers of pre-censored comments and drawn the following chart that shows the new stories having no comment at all although their commenting function is "available".
The third time-series chart exhibits the amount of comment deletions on a weekly basis. The topics found in the deleted comments are fairly aligned with those deleted from news stories.
This study was funded by the Google Policy Fellowship 2012 and collaborated between the Quello Center for Telecom Management and Law at MSU and the Center for Communication Research at the City University of Hong Kong. Please send your comments and questions to songyan at msu dot edu. Thank you for reading this post.