Bodong Chen

Crisscross Landscapes

Notes: Sentiment Analysis in MOOC Discussion Forums



Wen, M., Yang, D., & Rosé, C. P. (2014). Sentiment Analysis in MOOC Discussion Forums: What does it tell us? In EDM2014.

Citekey: @Wen2014a


General comments

What will be a more interesting variable to correlate to than drop out rates? The link between sentiment and dropout behaviors is slippery, because we have no evidence that negative sentiments will cause dropouts. So the main logic in this paper is undermined.

The limitation with data used in this study is also noticeable. Only having log data from one course is not most helpful.

The survival analysis at the individual level is very interesting. Although it does not confirm a general pattern across three courses, it provides an interesting way to model student participation in moocs, as well as it’s relation with other factors. Plus, it provides a kind of negative knowledge that sentiment analysis as applied in this study might not be reliable for studying moocs.

Further, the semantic analysis of topic extraction is potentially very useful for facilitating mooc forum discussion. It could lead to very interesting tool development.”


“We observe a correlation between sentiment ratio measured based on daily forum posts and number of students who drop out each day.” (p. 1)

“we evaluate the impact of sentiment on attrition over time” (p. 1)

“a significant proportion of students remain in the course longer than that but then drop out along the way. This suggests that there are students who are struggling to stay involved. Supporting the participation of these struggling students may be the first low hanging fruit for increasing the success rate of courses.” (p. 1)

“Recent research on social media use has demonstrated that sentiment analysis can reveal a variety of behavioral and affective trends. For example, collective sentiment analysis has been adopted to find relationship between Twitter mood and both consumer confidence and political opinion[20] and stock market fluctuations [5].” (p. 1)

“While the association between sentiment with summative course completion has been evaluated in prior work [24], and while the impact of other linguistic measures and social factors on attrition over time has been published as well[31, 26], this is the first work we know of that has brought this lens to explore what sentiment can tell us about drop out along the way in this type of environment. In particular, we explore this connection across three MOOCs in order to obtain a nuanced view into the ways in which sentiment is functioning similarly and differently or signaling similar and different things across these three courses.” (p. 2)

“Adamopoulos [1] has applied sentiment analysis to collect students’ opinions towards MOOC features such as the course characteristics and university characteristics.” (p. 2)

“Learn to Program: The Fundamentals 3 , offered in August 2013” (p. 2)

“Brinton et al. [6] identified high decline rate and high-volume, noisy discussions as the two most salient features of MOOC forum activities. Ramesh et al. [24] use sentiment and subjectivity of user posts to predict engagement/disengagement. However, neither sentiment nor subjectivity was strongly predictive of engagement in that work.” (p. 2)

“Most of the prior work that addresses this question involved conducting surveys and interviews [25, 2].” (p. 3)

“In the opinion estimation step, for each set of posts that are related to a topic, we define the topic sentiment ratio xt on day t as the ratio of positive versus negative words used in that day’s post set. Positive and negative terms used in this paper are defined by the sentiment lexicon from [14], a word list containing about 2,000 and 4,800 words marked as positive and negative, respectively.” (p. 3)

“In order to derive a more consistent signal, and following the same methodology used in previous work [20], we smooth the sentiment ratio with one of the simplest possible temporal smoothing techniques, a moving average over a window of past k days:” (p. 3)

“4.1.2 Results Part 1. Opinion towards the course” (p. 3)

“The underlying assumption that students dropped out of courses because of sentiment may not be true.”

“To objectively measure students’ opinions towards the course, we count how many students drop out of the course each day based on the students’ login information.” (p. 3)

“To extract students’ aggregate opinions on a topic, we need to first identify posts relating to a topic (post retrieval step). Then we need to estimate the opinion of these posts (opinion estimation step).” (p. 3)

“To decide which topic keywords to use for each course tool of interest, we run a distributional similarity technique called Brown clustering [7, 13] on all three courses’ posts in order to identify clusters of words that occur in similar contexts. It is conceptually similar to Latent Semantic Analysis, but is capable of identifying finer grained clusters of words.” (p. 3)

“In all three courses, Course sentiment ratio is much higher during the last course week.” (p. 5)

“Across all three courses, we observe a trend in the expected direction, namely that higher sentiment ratios are associated with fewer dropouts, which is significant in two out of the three courses(r = -0.25, p < 0.05 for the Teaching course; r = -0.12 for the Fantasy course(Figure 1(b)); r = -0.43, p < 0.01 for the Python course(Figure 1©)).” (p. 5)

“4.2 User-level Sentiment Analysis: Survival Analysis” (p. 5)

“Our goal in this section is to understand how expression of sentiment relates to attrition over time in the MOOCs on a user-level. We apply survival analysis to test if students’ attitudes as expressed in their posts or the ones they are exposed to correlate with dropout from the forum discussion.” (p. 5)

“Part 2. Opinion towards the course tools” (p. 5)

“It is impossible for course instructors to read hundreds or even thousands of potentially related-posts. In this session, we try to extract the most prominent opinions associated with each course tool from these posts.” (p. 6)

“4.2.2 Modeling Survival analysis is a statistical modeling technique used to model the effect of one or more indicator variables at a time point on the probability of an event occurring on the next time point. In our case, we are modeling the effect of certain language behaviors (i.e., expression or exposure to expression of sentiment) on probability that a student drops out of the forum participation on the next time point. Survival models are a form of proportional odds logistic regression, and they are known to provide less biased estimates than simpler techniques (e.g., standard least squares linear regression) that do not take into account the potentially truncated nature of time-to-event data (e.g., users who had not yet ceased their participation at the time of the analysis but might at some point subsequently).” (p. 6)

“As we see in Table 4, not only do we see differential effects across courses, we also see differential effects between behavior and exposure.” (p. 6)

“Intuitively, we might expect that positive sentiment indicates that students are enjoying or benefitting from a course whereas negative sentiment might indicate that a student is frustrated with a course. The results from our quantitative analysis are not consistent with our intuition.” (p. 7)

“6. LIMITATIONS” (p. 7)

“The sentiment lexicon we utilize is designed for predicting sentiment polarity of product reviews.” (p. 7)

“The negative word use in this course is actually a sign of engagement because messages with negative words are more likely to be describing science fantasy related literature or even posting their own essay for suggestions.” (p. 7)

“Does this just simply implied the wrong algorithm was used?

Maybe an important implication for future research of forum discussion.”

“Users who post messages like this are more in the role of forum content consumer. They may only be browsing the forum to look for answers for their particular problems without the intention of contributing content, such as solving the other’s problems.” (p. 7)