Thursday, November 14, 2013

Psychology of Sharing on Social Media: Attention, Emotion and reaction

I am very glad to see my Boston Globe/Facebook study well received by curious readers and featured by several organizations, such as the Harvard Nieman Journalism Lab, Chartbeat, Social Fresh, and ISHP Consulting. Meanwhile, I've been giving talks on this research at different places, including the Boston Globe, Mozilla Festival in London, Spiegel Online in Hamburg and Hacks/Hackers Berlin. If you find this research interesting and want to further the discussion, please buzz me on Twitter @sonya2song or drop me a line at sonya2song#gmail. Please also feel free to download the slides (last updated on December 9, 2013) developed for my presentations.

In the previous study, I presented data analysis that examined how users read and share Boston Globe posts on its Facebook Page. In this extended analysis, I’ve included qualitative analysis with a focus on content, cognition and emotion. My goal is to help newsrooms better promote their stories on and attract more attention from social media.

To achieve this goal, I’ve been digging into psychology literature for inspirations. Overjoyed, I’ve discovered some theories and findings that are portable to the social media environment:

  • Two modes of thinking, fast and slow, attract different types of attention.
  • Sharing on social media is
    • Charged with emotions,
    • Bounded by self-image management, and also
    • By concerns over relationship with others.

Again, this report is based on the three key metrics featured by Facebook Insights: reach, engaged users, and talking about this. According to Facebook, reach is defined as “the number of unique people who have seen your post”; engaged users as “the number of unique people who have clicked on your post”; and talking about this as “the number of unique people who have created a story from your Page post. Stories are created when someone likes, comments on or shares your posts; answers a question you posted; or responds to your event”. These metrics are counted as absolute numbers of unique visitors in various ways and reflect user behavior from passive consumption to active interaction.

-    *    -    *    -    *    -    *    -    *    -    *    -    *    -    *    -


In the proto-analysis of Boston Globe traffic on Facebook, I reported the findings on image size and the “BREAKING” label. The general pattern is that illustrating a post with an image is associated with higher traffic compared to no image, so is a large image compared to a thumbnail. This pattern holds across three key metrics by Facebook (Figure 2). In addition, mere “BREAKING” is associated with a higher reach, although not with engagement or talking about this (Figure 1). In fact, not only BREAKING NEWS but also other uppercase words are associated with a higher reach, including WEATHER WATCH, MAJOR UPDATE, BIG PICTURE, NOW LIVE, etc.

As a hardworking journalist, you may tell me it’s upsetting to know that readers are attracted to this kind of superficial stuff like BIG PICTURES and BREAKING NEWS. But the good news for you is that the attention triggered by primitive tricks is fairly cheap. To gain more engaged attention, sophisticated messages would be a better choice, which we’ll discuss in the section on cognitive strain and System 2.

Figure 1: "Breaking" is associated with higher "reach"

Figure 2: Larger images are associated with higher traffic

As a not quite positive example, MIT Technology Review may show us how to gain little attention. Look at its Facebook Page, we can see a lot of big T’s, certainly the logo of the magazine. It’s quite obvious that the stories are shared as links and the logo is automatically extracted by Facebook. As such, these stories have failed to have an interesting or simply relevant visual companion. The repeated T’s may also have turned the fans blind toward this symbol. The sad situation is that, even though the Review generates a lot of thrilling stories, its Facebook presence is far from compelling—you may have noticed the small numbers of shares and likes in Figure 3.

Figure 3: Facebook Page of MIT Technology Review

Wednesday, August 21, 2013

Cataloguing Internet Censorship

Note: this blog post is a republication of my recent contribution to China Outlook with permission. China Outlook is “an online, subscription-only newsletter that specialises in writing and research about China’s future. Based in Hong Kong, it is editorially independent.” 

The composition in this photo is a very visual allegory for the attitude of many citizens towards what the government wants them to believe. The photo was taken outside a disco club by the name of “Propaganda” at Wudaokou, Beijing, in 2005. This sleeper is most likely a migrant worker from the surrounding countryside. He was using his shoes as a pillow, the only comfort he could afford, while resting on the hard concrete stairs. Credit: the same as the author of this blog.
Just before 6 am on 26 June 2013, rioting broke out in the town of Lukqan Township in Xinjiang Uyghur Autonomous Region, in northwest China and home to millions of Uyghur Muslims. At least 24 people were killed by suspected Islamists, who set about them with swords and knives. 

About seven hours later the state-run Xinhua News Agency broke the news on its English news wire service, followed closely by numerous Chinese news portals that covered the story with a Chinese translation. 

A few hours later, Chinese speakers living in the United States first heard the news on the BBC and CNN, both of which quoted the original report from Xinhua. But when they began to look on Chinese websites, they could find hardly a trace of the story. The censors had been at work. 

For those living in China and China observers such censorship of important events is not unusual. China has never been far from the bottom of the Press Freedom Index published by Reporters without Borders and is presently in 173rd position, with just six countries worse . Foreign websites are routinely blocked and Chinese websites are under continual, close scrutiny. 

According to the Harvard Berkman Center for Internet and Society, China devotes “substantial technical, financial, and human resources” to develop the apparatus of censorship and has instituted “by far the most intricate filtering regime in the world.” Since censorship is a common practice in China, censored information has become an alternative perspective that we should not neglect when seeking to understand this country. 

Censorship is a crude tool at the best of times and often the material censored carries crucial information for people both inside and outside China – even if it is too inconvenient for the Chinese authorities. The outbreak of SARS in 2002-3, for instance, was censored from Chinese media for five months, presumably to avoid spoiling the harmonious atmosphere created for the 16th National Congress of the Communist Party. However, the decision was not without implications: it allowed the virus to travel irreversibly across continents until a worldwide epidemic emerged. 

So too with other subjects, which, like SARS, are not only inconvenient but also crucial; in fact, it hardly makes sense to devote huge resources in terms of human labour and computing power to monitor and eliminate subject matter that was merely trivial. 

At the same time, the authorities’ decision to censor information that is inconvenient, even if it is important, provides an opportunity to observe China from the standpoint of what it discards, rather than what it consumes. And that is precisely what a number of researchers outside China have now begun to do – namely, to examine and assess news stories that have been censored from the Chinese media. 

Of course, the analysis of deleted stories will never provide a full picture of media control. In many cases journalists familiar with a particular regimen will know not to write certain kinds of stories. This is a form of pre-censorship. The journalists in a newsroom may often be privy to certain information that they know it would be foolish to circulate. But with articles that have appeared and then just as rapidly have disappeared, there is a different situation. In these cases, the material has been published, but is subsequently judged to be unsuitable and is removed. 

But how to collect this censored information before it vaporizes? In recent years scholars and institutes have been trying to uncover information censored from news portals and social media in China. A common technique involves two steps: collect and check. First, information published online in China is collected using big data techniques and, in the second stage, is repeatedly and continuously checked for availability. Once a link appears broken, it is “red flagged” for suspicion. 

While it is possible for articles to be removed completely for editorial purposes, in practice this is rare. In such cases, as with corrections to, say the New York Times website, it is usually possible to identify the corrections, through the use of italics or some similar device. 

Even stronger evidence to rule out alternative explanations beyond censorship can be obtained by comparing deletions in a variety of news media to see if they cover similar topics. If so, then censorship is a strong possibility, because similar deletions reflect the systematic control of content, which in turn is a good indication of regulated behaviour.

One recent censorship study conducted jointly by Michigan State University and the City University of Hong Kong focused on NetEase and Sina, two major news portals in China. 

From November 2011 to October 2012, the researchers found on average that two articles were deleted from each website per day and that the deletions from the two websites followed similar patterns. In particular, domestic news had a significantly higher probability of being deleted compared to international news: twice as likely for NetEase, and six times for Sina. Beijing stories had twice the probability of deletion compared to news covering other places in China. Surprisingly, very few articles on Tibet appear to have been deleted, a fact that the researchers put down to pre-censorship. Compared to neutral stories, for NetEase, positive news had one third the probability of being deleted whereas negative news nearly four times, and for Sina, negative news had three times the probability of being deleted. 

From a list of 13 news topics, five were strongly associated with deletions: politics, business, foreign affairs, food and drugs, and military. These topics frequently included sensitive keywords or phrases, such as land acquisition, death toll, social unrest, poor working environment, food safety, and disputed territories. 

These findings are in line with sociological theory on censorship, which suggests that the elimination of improper political news helps keep ideological purity, the removal of military news reflects a concern over national security, the ban on news covering disputed territories indicates the protection of national interest, and the expurgation of news on unsafe foods is one approach to maintaining social order. Like all modern states, China is wrestling with the impact of the growth in social media. It is aware of just how quickly online media can amplify the impact of events and invite participation, as seen in the Arab Spring movement. In his well-received book Rewire, Ethan Zuckerman, director of the MIT Center for Civic Media, narrates how this movement started with a family’s protest against government corruption in Tunisia, spread beyond one town, and eventually reached over a dozen countries. 

Over a decade before the Arab Spring, China’s leadership had foreseen the potential threat of online media and started developing censorial strategies and tools. What it aims to constrain is the mobilizing power of online media, as indicated by a study conducted by Gary King and his colleagues at Harvard University. 

From the messages deleted from nearly 1,400 Chinese social media platforms, they observed that the state aims to prevent and suppress ongoing and potential collective activities. This is in contrast to the widely held view that first and foremost the Chinese censors target harsh criticism of the state. Hence, on the one hand, social media are censored to prevent mobilization, and on the other hand, news media are censored to eliminate possible triggers for such mobilization. That is why international news was found to have been much less deleted than domestic news from NetEase and Sina, because remote events are not relevant enough to provoke strong reactions among citizens. 

China is not the only country to censor social media. Censorship exists in all societies and all forms of media. For example, there is presently a growing international debate on the ease with which pornography can be accessed online and whether or not this is a danger to children. Websites regarded as promoting Islamic fundamentalism are routinely banned in certain countries. In China, whilst censorship is pervasive, the debate over who controls online access to information and what are its limits has barely begun.

Thursday, August 8, 2013

Q&A on Censorship with the Oxford Internet Institute

After presenting my study on China's censorship of online news at the Oxford Internet Institute (OII), I had a great talk with David Sutcliffe, the editor of the OII Policy and Internet Blog, and went through the following questions. The full conversation is published on the blog post titled Uncovering the patterns and practice of censorship in Chinese news sites 
  1. How much work has been done on censorship of online news in China? What are the methodological challenges and important questions associated with this line of enquiry?
  2. You found that party organs, ie news organizations tightly affiliated with the Chinese Communist Party, published a considerable amount of deleted news. Was this surprising?
  3. How sensitive are citizens to the fact that some topics are actively avoided in the news media? And how easy is it for people to keep abreast of these topics (eg the “three Ts” of Tibet, Taiwan, and Tiananmen) from other information sources?
  4. Is censorship of domestic news (such as food scares) more geared towards “avoiding panics and maintaining social order”, or just avoiding political embarrassment? For example, do you see censorship of environmental issues and (avoidable) disasters?
  5. You plotted a map to show the geographic distribution of news deletion: what does the pattern show?
  6. What do you think explains the much higher levels of censorship reported by others for social media than for news media? How does geographic distribution of deletion differ between the two?
  7. Can you tell if the censorship process mostly relies on searching for sensitive keywords, or on more semantic analysis of the actual content? ie can you (or the censors..) distinguish sensitive “opinions” as well as sensitive topics?
  8. It must be a cause of considerable anxiety for journalists and editors to have their material removed. Does censorship lead to sanctions? Or is the censorship more of an annoyance that must be negotiated?
  9. What do you think explains the lack of censorship in the overseas portal? (Could there be a certain value for the government in having some news items accessible to an external audience, but unavailable to the internal one?)

Tuesday, July 23, 2013

Why I Love OpenNews Fellowship and Why it's a Great Opportunity for Graduate Students

Knight Blog: Knight-Mozilla fellows strive for global impact in journalism. Photo credit:Knight-Mozilla Fellows
I have always been an intellectual drifter and the OpenNews fellowship has been the best reward for my adventures. When I first saw the post calling for 2013 applicants, I was so surprised and also excited to know such an opportunity was being created for the people just like me.

My background is highly mixed. I am currently a doctoral candidate in an interdisciplinary program at Michigan State University. At MSU, I have been studying media economics with my advisor Steve Wildman, a world-renowned scholar and Chief Economist at the FCC, along with courses in psychology, communication, and large-scale data analysis. Prior to MSU, I studied computer science and journalism and worked in both industries.

If you are also a graduate student, you will probably enjoy the Knight-Mozilla fellowship just like I do because it provides opportunities that you may not easily find in academia.

You'll get to work on real-world problems. Through my work experience and academic training, I gained a better understanding of how people consume media content, how they behave on the Internet, and how content providers could better cater to consumers’ demands and therefore develop sustainable business models. I have been able to contribute my expertise on these topics to the Boston Globe, my newsroom host. With the support of the staff here, I conducted an empirical study of the Boston Globe's Facebook Page. When I presented my findings to colleagues, some people responded, “Thank you for sharing your findings. We didn’t know those things!” It is thrilling and satisfying to find truths and share knowledge in a practical setting.

Often as a graduate student, you may only have the privilege and support to work on problems like this during summer internships. The Knight-Mozilla Fellowship offers even more because it lasts beyond a summer and allows you to fully immerse yourself in a world-class newsroom. If you hold a similar belief that research should work toward real-world impact, definitely apply for this fellowship program and aim to make an impact on the world.

You'll have the support you need to work quickly. As you may have experienced, funding is an issue for a number of universities. You will be surprised how much support you can get from this Fellowship: a generous research budget and travel funding are among them. Moreover, a frustrating side of academia is that research results may take months or years to get published. In contrast, as an OpenNews Fellow, we can organize our own seminar or attend a workshop to reach out to a larger audience. With this support, we are able to give a louder shout to the world about what amazing things we have created or found.

You'll have the freedom and flexibility to follow where your curiosity leads.  Although Fellows often offer a helping hand to our hosts, we are not obliged to commit to any task in the newsrooms, because all our funding is from OpenNews. This independence lets us pursue our own interests without being bound by routine work that regular employees have to undertake. Meanwhile, we are encouraged to work with other Fellows and organizations. Right now, I am working with two other Fellows, Stijn Debrouwere and Brian Abelson, on measuring news impact and contributing my knowledge to ProPublica on a project related to Internet policy. We don't only collaborate remotely and virtually, but we also reunite in person on different continents, to put our heads together and hack on something.

Certainly, there are more opportunities and privileges for you to discover in this fellowship program. If you are an adventurer like me, I encourage you to step out of your ivory tower and join us to explore this fast spinning world where technology meets news.

Sunday, July 14, 2013

Proto-analysis of Boston Globe Traffic on Facebook

Update on 7/18/2013: In this post, you'll find a fair amount of explanations about statistics and key metrics. If you're already familiar with them, please refer to a neat summary published by the Nieman Journalism Lab.

Last week, I gave a little talk at the Boston Globe, presenting my preliminary analysis that examined how the Boston Globe articles were perceived through its Facebook Page. Through my analysis, I hoped to answer two questions. What types of stories are shared by the Boston Globe staff on the social media platform? In turn, how do different types of shared stories differently affect Facebook users’ reading and sharing? By answering these two questions, I aimed to find out how well the staff’s intentions were aligned with readers’ interest as measured through three metrics offered by Facebook, and whether there were gaps between the intentions and perceptions that would signal room for improvement.


 Highlights of the Study

  • I examined 215 stories shared in two weeks on the Facebook Page of the Boston Globe.
  • I found several attributes correlated with attention:
    • Image size (none, thumbnail, single-column, and double-column)
    • Without or without a “breaking” label in the caption
    • Time of sharing (hour and weekday)
    • News topic defined by editors (business, metro, sports, etc.)
    • Related to the Boston Marathon bombing or not
  • There were gaps between staff’s efforts and Facebook users’ reading and sharing.


Facebook Insights and its Metrics


I exported the data through Facebook Insights, a built-in feature for Page administrators, to a spreadsheet file and later analyzed them in R, an open-source statistical tool. I kept the dataset fairly small to save time, especially since I cleaned data and labeled some of the variables manually, as automation was infeasible for them. In total, I examined 215 stories shared from May 7 to 21 this year.

My analysis was completely dependent on the three metrics Facebook Insights features: reach, engaged users, and talking about this. According to Facebook, reach is defined as “the number of unique people who have seen your post”; engaged users as “the number of unique people who have clicked on your post”; and talking about this as “the number of unique people who have created a story from your Page post. Stories are created when someone likes, comments on or shares your posts; answers a question you posted; or responds to your event”. These metrics are counted as absolute numbers of unique visitors in various ways and reflect user behavior from passive reading to proactive sharing.

The next section discusses statistical details that may not appear familiar to some people. Please click here to jump directly to the section on findings and implications.


Independent Variables, Dependent Variables, and Negative Binomial Regression

The statistical tool I used for this analysis is negative binomial regression, and I want to explicate the two terms, regression and negative binomial, to justify my choice of research method. Regression is a statistical process employed to estimate the relationships among variables. Variables serve different functions on analysis and some are labeled as independent variables and some others dependent variables. Dependent variables measure the attributes we expect to increase or decrease, such as life expectancy, happiness, and crime rate. Independent variables measure the factors that affect, predict or are associated with the outcome of dependent variables, such as educational level, blood pressure, police numbers, etc. Independent and dependent variables are by no means predetermined, but instead they are assigned freely for various research questions. For instance, we can estimate a graduate’s income from her educational level, or estimate how likely someone holds a master’s degree given her income.

In my case of analyzing Facebook data, I chose the three key metrics, namely reach, engaged users and talking about this, as dependent variables. The independent variables are different aspects of shared posts that possibly affect these outcomes. The aspects I included are news section, image size, “breaking” label, publication hour and weekday. Especially, I created a binary independent variable that marked stories as relevant or irrelevant to the Boston Marathon bombing, because this topic has been a beat followed closely by the Globe staff.

The reason why I chose regression is because it allows for assessing the association of each independent variable with the dependent variable separately. This is very important for the analysis. For example, more black women were reported to die of breast cancer than white women. Then could we assume that, biologically, black women confront a higher risk of the disease? Maybe not. If we include women’s occupation, education and income into the analysis, we could find that black and white women are not significantly different in developing breast cancer if they are at the same socioeconomic status (SES).

Taking the study of analyzing news stories as another example, we may observe story A is read by more people than story B. Can we claim that story A is more interesting than story B? Again, maybe not. We may find story A was shared at 8 am when people tend to check Facebook on their commute to work, whereas story B was shared at 11am when people are often busy working. Also, story A covers sports and story B covers international relations, while sports news is generally more popular than international news. Therefore, to control the various aspects of news stories, I need to run regression for more robust and reliable results.

On the question of which type of regression is most appropriate, a quick response is Poisson regression because it handles count data, such as how many times a week people watch TV, how many times a year tornados break out in the US, and how many people are waiting in front of you at a cashier. Because the data I collected violated an assumption for Poisson regression (equal mean and variance), I chose an alternative approach called negative binomial regression, because it is a good choice to deal with the overdispersion expressed by my data. For those interested in a description of these and other analysis methods, UCLA shares a lot of tutorials on statistical analysis, including negative binomial regression.

Coefficients generated by negative binomial regression are log ratios. To make the findings more comprehensible, in the following section, I present the ratios using the exponentiated coefficients.

Findings and Implications

This study was inspired by Facebook’s report on good practices for media companies. Facebook collected a sample of news institutes using Facebook Pages and reached the conclusions based on various practices of them. By contrast, my study was only focused on the Boston Globe and my findings were not always consistent with the suggestions given by Facebook.

“Breaking” Label

Facebook found that ‘posts that included “breaking” or “breaking news” received a 57% higher engagement over posts that were not identified as breaking news.’ In contrast, I did not find any significant difference in engaging users or going viral. The only difference I found is a significant increase in reach by 60%. From this, we could infer that the “breaking” label did not inhibit “engaged users” or talking about this and increased reach.

Image Size

In terms of illustrative images, four sizes can be observed in the posts on Facebook Pages. They are zero or no image, thumbnail images, single-column images and double-column images, but the double-column images cannot be seen on users’ news feeds, and is only available on Facebook Pages. For research purposes, I retained “double-column” as an image size. From the following chart, you can see how image size affected the amount of attention drawn from Facebook users. The ratios are exponentiated coefficients.
  • Quite obviously, illustrating a story with an image was better than with no image.
  • A thumbnail image appeared not to make a significant difference than no image.
  • The larger an image was, the more popular a shared story was likely to be.

Marathon Bombing

The stories about the Boston Marathon bombing significantly attracted more attention on Facebook. Across the three key metrics, reach, engaged users, and talking about this, these stories increased the metrics by 31%, 97%, and 64%. However, when I looked at how users were engaged in doing likes, comments and shares, I realized people didn’t necessarily “like” bombing-related stories. It’s not surprising because “liking” a horrible story may create a cognitive conflict for some people and therefore they don’t feel comfortable “liking” it. Regarding comments and shares, bombing-related stories enjoyed increased performance by 90% and 80%. Again, the ratios here are exponentiated coefficients.

Sharing Hour and Weekday

Because the data set spanned only two weeks, I don’t consider correlations to sharing weekday to be reliable. However, it’s large enough to compare 24 hours across a day. The following chart shows how the stories were shared by the staff and perceived by Facebook users. From it, we can see:
  • More stories were shared during business hours.
  • However, across the three metrics, the performance was not great during business hours.
  • The traffic seemed to peak around 8 am and around 11pm - 2am EST.
    • West coasters may contribute to after-midnight lags.

I talked with Joel Abrams at the Boston Globe about why peaks appeared in the early morning and late night. We’ve conjured up two theories for the phenomenon. First, people check Facebook more frequently before and after work, for instance, on commute or in bed. Second, quite uncooperatively, newsrooms share fewer stories during those “idling” hours because social media editors are also not at work. As such, those hours may see a shortage of new posts and therefore there is less competition for attention seekers. In the future, we could experiment with sharing stories in the early morning and late night to see if we could possibly boost traffic.

News Sections

There are in total 12 news sections predetermined by the Boston Globe staff: art, business, ideas, lifestyle, magazine, metro, news, opinion, slides, specials, sports, and upgrade. (Upgrade posts are advertising that invites people to upgrade their membership to subscribers.) The following chart shows how many stories the staff shared across topics and how different topics were associated with reach, engaged users and talking about this. Between the staff’s shares and the readers’ attention, there were in fact some gaps.

The regression analysis assessed with higher precision how different news sections affected stories' performance on Facebook. Art news was taken as the baseline and the other news sections were compared to it. The results were shown as ratios (e.g., 20% means only one fifth as good as art news, and 300% means three times as good as art news). Please note that the confidence intervals were exponentiated from regression estimates and that's why the upper interval is larger than the lower interval. Now we can sort out news sections by their impact on performance:
  • Sorted by the amount shared by staff, high to low are:
    • Metro, sports, news, lifestyle, arts, business, opinion, slides/mag/upgrade, ideas, and special.
  • Sorted by reach, top ones are:
    • Opinion, slides, lifestyle, and business
  • Sorted by engaged users, top ones are:
    • Opinion, metro, lifestyle, and business
  • Sorted by talking about this, top ones are:
    • Slides, opinion, sports, and metro.
  • The misalignment between staff’s shares and readers’ perception may be a starting point for adjustments.

To compare the two dimensions (staff’s posts and readers’ attention), I scatter-plotted them together on one chart. In this chart, the horizontal axis represents how many stories were shared by the staff, and the vertical axis denotes how the stories were perceived by Facebook readers, in terms of reach, engaged users, and talking about this. The data were log transformed so that the data points could be squeezed together for a more sensible view. The units in fact didn’t matter here, because what we hope to see is the ratio of effort to outcome. or efficiency. To indicate their efficiency in the readers’ responses to the staff’s efforts, I roughly grouped the news topics into high, medium and low and colored the background with yellow, grey and white. It appeared that, given the same amount of posts, opinion engaged more activities and photo slides tended to go more viral. Meanwhile, we could see that the shared posts of opinion and photo slides were fairly scarce. There is a gap between the amount of articles published by section and the traffic they capture, and this could be a fruitful point of analysis for future adjustment in article sharing choice. Specifically, this study suggest that more readers will be engaged if there were more posts of opinion, photo slides, business, and lifestyle.

Virality or Conversation Rate

The following chart shows a trend: when stories reached a larger amount of readers, more readers would be engaged in more activities around the stories, with each dot representing one shared story. This trend appears in a roughly linear relationship, between reach, engaged users, and talking about this. Meanwhile, we can easily discern some circles dangling beneath the trending lines, residing in the red circles. So why did those stories generate fewer activities?
The virality extent, or so-called conversation rate, helps to discover these underperforming stories. This metric is calculated as the ratio of talking about this to reach. I’ll list the least as well as most conversational stories and give a quick summary of the observed patterns in the content.

Most conversational stories

  1. Oklahoma City Thunder star Kevin Durant today pledged $1 million to recovery efforts after yesterday's devastating tornado.
  2. Romeo and Juliet, the swans who reside at the Boston Public Garden during the summer (and at Franklin Park Zoo during the winter), returned there today in a sign that the spring season is truly here. See photos:
  3. The lilacs are in full bloom at Arnold Arboretum.  This photo was taken yesterday, known officially as Lilac Sunday at the Arboretum.  Stop by if you have a chance.    Globe staff photo / Yoon S. Byun
  4. Say hello to the CapeFlyer. It had its inaugural run today and is scheduled to have its official debut next weekend, the first time in about 25 years service from Boston to Cape Cod will be offered.  Would you ride it?
  5. The Marathon bombing sheared off the right leg of Marc Fucarile (pictured, with his fiancee Jen Regan) in a millisecond. It spared the left, but not by much. Now, he and his family are in a painful waiting game to see if his “good” leg can be saved.
  6. A child was pulled from the rubble of Plaza Towers Elementary School in Moore, Okla., after an EF-4 tornado struck. The tornado, with winds up to 200 mph, was up to a mile wide and left behind large areas of devastation.
  7. “It was one of the greatest moments in Boston sports history,” writes the Globe’s Dan Shaughnessy about the Bruins’ thrilling win over the Maple Leafs. “And then came a miracle… the Bruins scored and scored and scored.”
  8. The Boston Athletic Association is inviting all runners who failed to finish 2013 Boston Marathon to run in next year's race.  This affects 5,633 runners.
  9. Brad Marchand scored the Bruins' game-winning goal over the Rangers at 15:40 of overtime. Story:    (Photo credit: AP)
  10. After learning she had an 87% chance of developing breast cancer, actress Angelina Jolie underwent a preventative double mastectomy.  Jolie shares her story in a powerful The New York Times op-ed today:     EPA photo

Least conversational stories

  1. Keith Reddin’s thriller “Almost Blue” at the Charlestown Working Theater, isn’t so much blue as noir
  2. #Recipe for paella-stuffed peppers
  3. New: Matthew Gilbert's Buzzsaw column. As the cult favorite, "Arrested Development," returns with a season-sized “episode dump,” Globe critic Matthew Gilbert asks, does giving viewers too much leave them with nothing to talk about?
  4. Make mom feel even more special with these stylish Mother’s Day gifts.
  5. The Phoenix Suns named 33-year-old Ryan McDonough, formerly of the Boston Celtics, as their new general manager.
  6. Album review: The soundtrack for Baz Luhrmann's film adaptation of "The Great Gatsby," curated by Jay-Z, is a fantastical reimagining of that era, putting ‘20s jazz in the modern context of pop and hip-hop. Oddly enough, the one thing the soundtrack is missing is heart.
  7. Creative restlessness and a sense of adventure are at the heart of Iron & Wine’s latest album, “Ghost on Ghost,” which Sam Beam will celebrate with a show at Berklee Performance Center tonight.
  8. Book review: The beloved author of “The Kite Runner,” Khaled Hosseini, returns to the rugged landscape of his home country, Afghanistan with "And the Mountains Echoed."
  9. Jon Lester gave up six runs in six innings in Chicago as the White Sox defeated the Red Sox, 6-4.
  10. Yahoo is buying Tumblr for $1.1 billion. Do you think this will help rejuvenate the Yahoo brand? Is Tumblr a good investment?
Here is my quick summary of patterns related to conversational potential of stories.
  • Beautiful and pleasant stuff was the most conversational, such as photo slides.
  • Also highly conversational: there’s a problem but there have been (or would be) a solution:
    • Tie but broken by miracle win in sports
    • Failed to finish marathon but were invited back to do it
    • Marathon bombing victims but were given medical care
    • Natural disaster but children were saved
    • Chance of cancer but intervention minimized it
  • The least conversational:
    • Arts related (music, movies, books, etc.)
    • Factual information (sports scores, settled business deals, etc.
  • The high and low engagement is consistent with prior research that higher emotional reaction leads to more frequent expression.


Limitations and Future Research

  • Limitations
    • The data set is fairly small (n = 215)
    • Hence, more sampling errors and biases in results
    • Also omitted to examine how the frequency of shares would affect readers’ perceptions (the more shared stories the better, or vice versa, or doesn’t matter?)
  • Future research
    • Time-series data
    • Demographics (gender, age ranger, location, etc.)
    • Devices (web vs. mobile, platform types, etc.)

Monday, June 10, 2013

Talk on News Censorship

I'm fortunately funded by the Knight-Mozilla OpenNews Fellowship program to attend a conference on China and the New Internet World organized by the Oxford Internet Institute.  There I will give a presentation on China's news censorship.  I've uploaded the full paper and the slides online, please feel free to download them for more information.  Also, I have more data and preliminary findings unpublished and I'd love to share and discuss them.  My email address is songyan at msu dot edu

Prior and Ongoing Research on Internet Censorship

Internet censorship has been attracting much attention from various academics and institutes.  For example, the Open Net Initiative (ONI) has been constantly testing the availability of websites in 74 countries and rating government control of content related to politics, social issues, Internet tools, and conflict/security (Palfrey, 2010).  The Open Internet Tool Project (OpenITP) surveyed circumvention tool users living in China to understand how they bypass the Great Firewall in hopes of building better tools to serve the needs of internet users in China and other censored regimes (Robinson et al., 2013).

Among the empirical studies focused on online media, Bamman et al.’s (2012) work claimed to be “the first large–scale analysis of political content censorship” that investigates messages deleted from Sina Weibo, a Chinese equivalent to Twitter.  They found 16.25% of posts were deleted after their publication time and recognized some characteristics related to post deletions, including 295 sensitive keywords and the outlying provinces such as Tibet and Qinghai.  Beyond Sina Weibo and on an even larger scale, King et al. (2013) collected data from nearly 1,400 Chinese social media platforms and analyzed the deleted messages with the aid of linguistic software.  In contrast to previous presumptions that its harsh criticism of the government is the target of censors, King et al. found that indeed it's ongoing and potential collective activities that the state aims to prevent and suppress. 

Research Methods in a Nutshell

To our best knowledge, however, censorial practices in online news media have never been studied, not to mention extensively investigated through computing approaches.  Therefore, our study may be the first empirical attempt that systematically examined the news articles deleted from the Chinese cyberspace.  

We developed scripts to collect news articles published on NetEase and Sina, two major news aggregators headquartered in China.  Meanwhile we continuously checked whether or not these articles remained available and we marked a news article as deleted once its link was found broken.  In fact, to make sure that the news story was really deleted due to its content rather than editorial or technical reasons, we searched across the websites for the articles with the same title but under a different link.  Only when duplicates were unavailable did we claim that a particular story was deleted. 

After collecting thousands of deleted news stories, we ran a regression over these data to detect patterns associated with deletion.  The technique we adopted is ReLogit (King and Zeng, 2001a and 2001b), a logistic regression handling rare events data.  This tool was developed by political scientists to analyze rare events, such as wars and coups.  For this reason, this is an appropriate tool for our study because the over deletion rates across the two websites were under 1%, as summarized below. 

Findings and Conclusions

During the course of our study, on each website, about two articles were deleted per day and the overall deletion rate was 0.05% on NetEase and 0.13% on Sina Beijing.

Several similar patterns have been found across the two news portals: 
  • Domestic news had a significantly higher chance of being deleted than international news: twice as likely for NetEase, and about six times for Sina Beijing.
  • News covering Beijing had twice the chance for deletion compared to news covering other places in China.
  • Tibet as a subject matter had little relation with deletion. 
  • National, compared to local, news was significantly associated with deletion for both websites: For NetEase, one and a half times as likely to be deleted, and for Sina Beijing one third times as likely to be deleted.
  • Nature of events was another strong indicator. Compared to neutral stories, for NetEase, positive news had one third the chance to be deleted whereas negative news nearly four times, and for Sina Beijing, negative news had three times to be deleted.
  • Five out of 13 coded news topics were strongly associated with news deletions, including politics, business, foreign affairs, food and drugs, and military, although the strengths varied across the categories and the websites.
From this evidence, we reached the following conclusions: 
  • The two Chinese news portals deleted news with similar patterns.
  • These similarities are translated to the practice of systematic control, the quintessential component of the definition of censorship (Peleg, 1993). 
  • Hence, for the first time, we have confirmed and quantified the online news censorship in China. 

Taboo Words

Beyond news deletion, I've been examining comment deletions as well.  I've created some word clouds with the help of Wordle and highlighted the keywords most commonly found in deleted comments.  They're not included in the paper or the slides. 

These keywords are aligned with our general understanding of taboo topics, such as land acquisition, death toll, social unrest, food safety, pollution, and lamentable work environment. 


Comments Prohibited and Suppressed

A second research topic of mine is how comments are manipulated and what patterns are associated with the manipulation.  Various types of manipulation have been observed and they include having commenting function disabled, screening and filtering submitted comments before publication (i.e., pre-censorship), and deleting published comments after publication (i.e., post-censorship).  This topic isn't included in the paper or the slides. 

To make this research topic more understandable, I'll first elaborate on the general practice of Chinese news portals.  Most of the time, news portals welcome and encourage comments because interactions boost web traffic.  However, a small portion of news stories have their commenting feature disabled.  There are two way to implement this function.  On NetEase, a notification is put under a story, informing "commenting is disabled" and the button for commenting is unavailable.  Sina takes a more subtle approach and puts no such a notification and meanwhile users can submit comments as usual but the comments are never displayed on the website.  These are pre-censorship techniques.  As to post-censorship, both websites simply remove comments quietly after their publication.  A third type of manipulative technique is different from passively pre- or post-censoring comments, but to proactively hire Internet commentators, or so-called 50 Cent Party, to propagate orthodox ideas endorsed by the government. 

The following time-series chart demonstrates the first type of comment manipulation, which is to prohibit comments.  In this way, party organs attempt to impose official opinions through one-way communication on issues on North Korea, outlying provinces, controversial territories, major criminal case, and so on. 
More subtly, Sina "allows" comments but never shows some of them on the website.  I've figured out how to send parameters to the API to request the numbers of pre-censored comments and drawn the following chart that shows the new stories having no comment at all although their commenting function is "available". 


The third time-series chart exhibits the amount of comment deletions on a weekly basis.  The topics found in the deleted comments are fairly aligned with those deleted from news stories. 


This study was funded by the Google Policy Fellowship 2012 and collaborated between the Quello Center for Telecom Management and Law at MSU and the Center for Communication Research at the City University of Hong Kong.  Please send your comments and questions to songyan at msu dot edu.  Thank you for reading this post.  

Search This Blog