The AERA Annual Meeting is probably the largest getting-together of educational researchers and practitioners around the world. This year it has around 13,000 participants. Several hundreds of them are taking advantage of Twitter as a "backchannel" for communication at this conference. On Twitter, hashtags are usually used to easily aggregate tweets of a same topic together. This year, three hashtags, i.e. #AERA2012, #AERA, and #AERA12, are used, and #AERA2012 is the most popular one. For me as a person who cannot make it to the annual meeting this year, following tweets gathered around these hashtags provides me an opportunity to "participate" virtually.
However, to follow such an amount of tweets produced by hundreds of people seems to be a challenge. To make sure I have a chance to go back and review those tweets, I created a public archive of tweets containing the #AERA2012 hashtag by using "Twitter Archive Google Spreadsheet – TAGS v3.0" developed by Martin Hawksey. During the first few days of this conference, participants have produced more than 4,000 tweets, around 2500 of them are unique. Even if you're ambitious enough to read all of them, it's always nice if you can put the whole bunch of tweets in a magic box and let it tell what the main things people are talking about.
The Overview program is designed to do this trick. Overview is an open-source tool to help journalists find stories in large amounts of data, by cleaning, visualizing and interactively exploring large document and data sets. Although it was designed for journalists, it has potential uses in many other contexts. I exported the Twitter archive as a CSV file and put it into the Overview program. It did some preprocessing tasks as the first step, and then I can really start to explore this dataset. (For details about how to do this, read this blog post).
This is what I got at the first place. The left side panel presents a topic tree that helps me navigate the whole dataset by topics this programs identified (through some natural language processing techniques). The right side panel is a visualization of tweets, with each dot representing an individual tweet. One design principle of the Overview project is to combine the power of both machine and human to make sense of data, so as a user I am encouraged to go through this dataset, and assign tags to each topic; that's what the central panel do.
So I went through the dataset by clicking on those topics (and sub-topics) in the topic tree one by one, and assign a tag to tweets that are about a same topic. This process was like coding when analyzing qualitative data. And what I did was just the first round of coding. After spending two hours doing this, here is what I got.
Both the topic tree and the visualization become colorful, because a list of tags with colors were created for clusters of tweets. The first round of tagging/coding identified 45 tags. They are rough and need further refinement. But by looking at the top 20 tags you can get a sense what colleagues were mainly talking about on Twitter during the conference. Results indicate that many people were using Twitter to share session information, invite people to their exhibitions, welcome people in various ways, share personal status (like I'm at a session about bla and it rocks), organize tweetup, etc. A number of tweets were about the case that US scholars were denied entry to Canada for the AERA meeting. Some tweets were about complains and discussions on wifi issues and confusion of multiple hashtags. Only a few among these top 20 tags were about content of presentations, such as race issues in education, bullying and cyber-bullying, higher education, and indigenous education. (Note: the tag "no-content" covers those tweets that have a list of question marks, maybe because they're in a different language (I did notice colleagues from Japan were also tweeting). Apparently Overview needs to make improvement on internationalization.) Overall, this preliminary analysis of this dataset may sound discouraging to me who was eager to participate virtually by following the Twitter stream of this conference, simply because only a little portion of those tweets were about real content of presentations and few of them were related to my interests.
The Overview program can further cluster the visualization by running a force-directed layout (see below). Now you can see where those different clusters/topics are distributing and navigate manually to make sense of the dataset. You may realize there are a number of dark dots in the background. My sense is they are tweets this program is having a hard time assigning to any topic. You can zoom in to read what they're really about and get a better understanding of the dataset, if you have time.
To summarize, this blog post presents a simple experiment using the Overview program to help me understand the "large" corpus of tweets produced by AERA Annual Meeting participants so far. Although the Overview program is still prototypic and needs further improvements, it did provide an interesting and fruitful way to achieve my goal. Meanwhile, this little experiment may also provide some insight for understanding usage of Twitter as a backchannel for communication at conferences.