Bodong Chen

Crisscross Landscapes

Notes: Blikstein - 2013 - Multimodal learning analytics



Citekey: @Blikstein2013

Blikstein, P. (2013). Multimodal learning analytics. In Proceedings of the Third International Conference on Learning Analytics and Knowledge - LAK ’13 (p. 102). New York, New York, USA: ACM Press. doi:10.11452460296.2460316


Paulo Blikstein’s intro to a series of his preliminary work on multimodal learning analytics. I think he is credited as the creator of this line of work.

I presented a series of proof-of-concept studies for what I termed “multimodal learning analytics:” a set of techniques that can be used to collect multiple sources of data in highfrequency (video, logs, audio, gestures, biosensors), synchronize and code the data, and examine learning in realistic, ecologically valid, social, mixed-media learning environments.


Automated, fine-grained data collection and analysis could help resolve this tension in two ways. First, they could give researchers tools to examine student-centered learning in unprecedented scale and detail. Second, these techniques could improve the scalability of these pedagogies since they make both assessment and formative feedback, which are more complex and laborious in such environments, feasible. (p. 1)

At the same time, in the well-established field of multimodal interaction, new data collection and sensing technologies are making it possible to capture massive amounts of data in all fields of human activity. These techniques include logs of computer activities, wearable cameras, wearable sensors, biosensors (e.g., skin conductivity, heartbeat, and EEG), gesture sensing, infrared imaging, and eye tracking. (p. 1)

multimodal data collection and analysis techniques (“multimodal learning analytics”) could bring novel methods to understand what happens when students can generate unique solution paths to problems, interact with peers, and act in both the digital and the physical worlds. (p. 1)

Rus et al. [20] makes extensive use of text analytics within a computer-based application for science learning, using expert-generated answers as a baseline. Beck and Sison [6] have demonstrated a method to assess reading proficiency combining speech recognition and probabilistic monitoring. D’Mello et al. [8] designed an application that could use spoken dialogue to recognize the states (p. 1)

of boredom, frustration, flow, and confusion. Other researchers [1, 3] have been using machine learning in many contexts to measure students’ learning and affect, in particular, focusing on their trajectory within a learning environment [2]. (p. 2)

I will describe examples of work in multimodal learning analytics, in three different areas: analysis of students learning to program, learning to build mechanical structures, and inventing machines. (p. 2)


2.1 Programmingmodes (p. 2)

We researched the creation of computer programs using 10,000 code snapshots from 74 students enrolled in an introductory undergraduate programming course, using two methods for the discovery of patterns in the data. The first method was a greatly enhanced version of the pattern detection used in the previous study, using machine-learning techniques such as cluster analysis and dynamic time warping. (p. 2)

All code updates were characterized based on their size and type, by identifying the number of lines and characters added, removed and modified between successive snapshots. This data was then used to construct a sequence of multidimensional vectors. This process was repeated for each assignment that the student completed. Using this sequence of vectors, the progress of each student was compared over the course of the class, by comparing their sequence of updates on a given assignment to their sequence of updates on the previous assignments. Students were then clustered based on the similarity of their update sequences. Six clusters emerged from the data. (p. 2)

Every time student ran, saved, or compiled their code, a snapshot of their code was stored. The analysis consisted in counting the amount and pattern of changes (additions/deletions) in the code, by calculating the number of characters and lines of code changed during the three-week assignment, as well as the frequency of compilation. (p. 2)

“Copy and pasters:” characterized by a step-shaped growth curve, alternating plateaus with few code changes (looking for code / adapting the code), and rapid growth in code size (pasting the external code) – mostly observed in novice coders. (p. 2)

“Self-sufficients:” characterized by a linear curve of steady increase in code size and almost no use of external code – mostly observed in more expert programmers. (p. 2)

2.2 Programmingstyles (p. 2)

The results of this work confirmed that indeed it was possible to cluster students in terms of robust behaviors that are dependent on expertise, and that those behaviors were correlated with their grades in the assignments and exams [19]. (p. 3)

A second study using text mining was designed to investigate students’ identity as engineers and scientists, and their level of identification with these professions [22]. (p. 3)

  1. TEXTMINING (p. 3)

3.1 Constructioncompetence (p. 3)

Prosodic analysis uses the pitch, intensity and duration of speech to infer intentionality and emotion. Linguistic analysis looks at several features such as pauses, filled pauses, and restarts, and can infer elements such as certainty. Sentiment analysis [17], using databases of terms such as the Linguistic Inquiry and Word Count (LIWC) and the Harvard Inquirer, infers sentiment from words or groups of words found in text. Content word analysis determines the knowledge contained in the text by using web-mined lexicons from chemistry, mathematics, computer science, material science and general science. Finally, dependency parsing and n-gram analysis can be used to inspect latent structures or meaning in the text by looking at group of words, sentences, and their relationship [7]. (p. 3)

The interest here was to examine if students who did well in the robotics workshop would employ different vocabulary and linguistic structures. Indeed, the data showed that when students were describing a project they were proud of (as measured by the facilitators’ field notes), they would use “I” or “my,” in sentences such as, “my original design was…” When the same student was describing a project that they were not proud of, they would instead use “it,” in sentences such as, “it’d be programmed to turn on…” in place of “I programmed it to turn on.” (p. 3)

Counting word frequencies, we were able to determine that, for the group as a whole, students identified as high achievers in the workshop used “I” more than seven times more frequently than low-achievers. Again, here we can observe how even simple automated n-gram counting within transcripts can reveal meaningful elements of students’ affect. (p. 3)


Of particular interest to this study is the presence of several features that accurately predicted expertise based on certainty. (p. 3)

These initial results confirm the work of Beck [6], which indicates that increasing expertise tends to increase student selfconfidence. (p. 3)

Additionally, novices exhibited a much higher frequency of disfluencies (average of 1.06 per normalized time interval) than experts (0.5), and experts had much longer and more frequent pauses. (p. 3)

In this paper, I presented a series of proof-of-concept studies for what I termed “multimodal learning analytics:” a set of techniques that can be used to collect multiple sources of data in highfrequency (video, logs, audio, gestures, biosensors), synchronize and code the data, and examine learning in realistic, ecologically valid, social, mixed-media learning environments. (p. 4)