read

References

Citekey: @rose2014what

Rosé, C. P., & Tovares A. (in press). What Sociolinguistics and Machine Learning Have to Say to One Another about Interaction Analysis. In L. Resnick, Asterhan C., & Clarke S. (Eds.), Socializing Intelligence Through Academic Talk and Dialogue. Washington, D.C.: American Educational Research Association.

Notes

This chapter grapples with two seemingly contradictory views towards discourse analysis: a sociolinguist approach that embraces complexity (such as Gee’s intertextuality approach), and a quantitative approach that strives to best discretize discourse behaviors. These two approaches could also be described as qualitative vs. quantitative, or variationist vs. constructive. Using an analogy of tapestry, Rosé and Tovares beautifully bring these two approaches into coherence: the many varied colors of threads (complexity) that are woven together in a tapestry are held together by a substrate of white threads (discretizeable behaviors) that give the tapestry its shape. Thus, machine learning needs to take both directions into consideration to effectively analyze interactions.

More specifically, Rosé and Tovares explicitly tackles the commonly held misconception that conceptualizes machine learning as a black magic box. They remind us to think about (1) what machine learning is capable of modeling in terms of interaction analysis, and (2) how much information we can lose from the continuous signal and still see the culture change we are looking for.

One thing we sometimes forget is that “a trained model will only be able to capture patterns within the space of models that are expressible within the provided feature space.” A significant challenge is to find a balance between finding powerful (rich) features and being realistic richness of features we could possibly extract from normally small sizes of training data. In our search for an effective machine learning technique to analyze interactions, “we need to focus our questions of the data in such a way that they can be answered with categories that have the potential to generalize as much as possible without losing the change we want to measure.” In other words, we would need to balance the “search for an effective computational model must balance the search for discretizations of the data that have predictive validity with respect to these external measures with a search of the space of possible representations of the data that would render those behavior categorizations as learnable using machine learning technology.”

Highlights

this specific chapter is meant to offer an inter-disciplinary perspective on how to measure progress within a classroom community towards appropriation of discourse practices reflecting principles of effective dialogic instruction (p. 2)

Academically Productive Talk (Resnick, Michaels, & O’Connor, 2010) (p. 2)

this chapter explores the tension between the drive to preserve complexity in analysis of social interaction that is a hallmark of interactional sociolinguistics and the drive to find well defined, stable patterns in data that would facilitate efforts towards using machine learning to model social processes in communication, which is a current research objective within the field of computational linguistics. (p. 2)

to start by investigating what is the nature of the social processes of interest and let these insights drive the search for meaningful patterns in data. (p. 3)

Since the recent turn of the century, we have already seen success at using machine learning in analysis of group discussions, especially in the field of computer supported collaborative learning (Soller & Lesgold, 2000; Erkens & Janssen, 2008; Rosé et al., 2008; McLaren et al., 2007; Mu, Stegmann, Mayfield, Rosé, & Fischer, 2012). Work in machine learning applied to interaction analysis in this community has been argued to offer two primary kinds of contributions. On the one hand, it has the potential to greatly speed up analysis of discussion data that is collected for research purposes. (p. 3)

The second kind of contribution is enabling real time analysis of discussion for learning, which enables automatic interventions that support facilitators, teachers, or students in their groups (Kumar, Rosé, Wang, Joshi, & Robinson, 2007; Chaudhuri, Kumar, Howley, Rosé, 2009; Ai, Kumar, Nguyen, Nagasunder, & Rosé, 2010; Kumar, Beuth, & Rosé, 2011). (p. 4)

Despite these early successes, the tension between the qualitative and the quantitative aims remain. (p. 4)

Even if we believe that reality offers us no discrete categories of behavior, it is always possible to discretize it if we are willing to lose some information and complexity in the process. The question then becomes one of deciding how much and which information it is safe to lose. (p. 5)

First, we will explore what machine learning is capable of modeling in terms of interaction analysis. (p. 5)

Second, we must consider the danger that the change as it happens is visible only within the part of the signal that we will lose in the discretization process. Thus, we will grapple with the question of how much information we can lose from the continuous signal and still see the culture change we are looking for. (p. 5)

Machine Learning: Toward Conceptual Tools for Effective Use (p. 5)

What we will see is that in order to make machine learning work for us, we cannot treat it as black magic. Rather, we must think realistically about what it is capable of doing, and we must understand what we must provide to it in order for it to do what it does best. (p. 5)

In other words, more attention should be given to the problem of representing data appropriately. (p. 6)

In the language technologies community there are strands of work that can be seen as related to the interaction analyses we are interested in, such as dialogue act tagging (Dielman & Renals, 2008), sentiment analysis (Pang & Lee, 2008), age prediction (Goswami, Sarkar, & Rustagi, 2009) and gender prediction (Garera & Yarowsky, 2009). (p. 6)

We will now take a step back and think more mechanically about how machine learning works. Machine learning algorithms are designed with the goal of finding mappings between sets of input features and output categories. (p. 7)

Machine learning algorithms find stable patterns within these feature vector representations by examining collections of hand-coded “training examples” for each output category, then using statistical techniques to find characteristics that exemplify each category and distinguish it from the other categories. The goal of such an algorithm is to learn general rules from these examples, which can then be applied effectively to new data. (p. 7)

In order for this to work well, the set of input features must be sufficiently expressive, and the training examples must be representative. It is important to pause and consider what representative means. (p. 7)

When datasets that models are tested on contain new subpopulations not represented within the training data, the new data can appear very different. (p. 8)

In recent years, a variety of manual and automatic feature engineering techniques have been developed in order to construct feature spaces that are adept at capturing interesting language variation without overfitting to content based variation, with the hope of leading to more generalizable models. Here we consider two issues in comparing across the previously used approaches, namely informativity (i.e., representational power) and feature space size. (p. 9)

We need powerful features to represent style in a generalizable way and in order to avoid throwing away information we need to pick up on style. However, we also need feature spaces that are small enough to make the search for the ideal model feasible with the amount of training data we can provide. (p. 9)

The bottom line is that no matter how powerful the machine learning algorithm, the trained model will only be able to capture patterns within the space of models that are expressible within the provided feature space. If the information we need to pick up on is thrown away at the representation stage, no algorithm, no matter how powerful, will be able to learn an appropriate model. (p. 9)

Thus, there is always a tension between the preservation of complexity and the attainment of computational feasibility. (p. 10)

The answer is that we need more insight into the language we are modeling so we can be strategic in our introduction of complexity into the representation of the data. (p. 10)

Operationalization of Culture Change (p. 10)

When we consider what operationalization to employ for modeling culture change in the classroom, we must think on two levels and consider trade offs in terms what our analytic approach allows us to see and conclude. The first consideration is what if any categories of interaction we can use to model this change, and the second is what features of language are associated with these categories. (p. 10)

As classroom culture changes over time, is it just the distribution of categories of behavior that change, or does the repertoire of categories change? Furthermore, as the culture changes, does the manifestation of the same categories change? (p. 10)

We use as an example a construct from Systemic Functional Linguistics (SFL) known as Heteroglossia (Martin & White, 2005), which we have operationalized for our own purposes in earlier work (Howley, Mayfield, & Rosé, in press). (p. 11)

The notion of Heteroglossia is linked to the analytic question of culture, virtues, etc. in knowledge building analytics. (p. 11)

The Heteroglossia dimension reflects the extent to which a speaker shows openness to the existence of other perspectives apart from the one that is reflected in the propositional content of the assertion being made. (p. 11)

With this in mind, let us now explain our current operationalization of Heteroglossia. (p. 11)

We describe it as identifying wording (p. 11)

choices that do or don‟t treat other perspectives than what is expressed in the propositional content of the assertion as open for consideration within the continuing discussion. (p. 12)

This creates a rather simple divide in possible coding terms for contributions: Heteroglossic-Expand (HE) phrases tend to make allowances for alternative views and opinions (such as “She claimed that glucose will move through the semi-permeable membrane.”) Heteroglossic-Contract (HC) phrases attempt to thwart other positions (such as “The experiment demonstrated that glucose will move through the semi-permeable membrane.”) Monoglossic (M) phrases make no mention of other views and viewpoints (such as “Glucose will move through the semi-permeable membrane.”) Non-Assertion (NA) phrases do not assert any propositional content. This includes questions, such as “Will glucose move through a semi-permeable membrane?” or fixed expressions lacking propositional content such as “Okay.” (p. 12)

James Gee‟s approach to discourse analysis, in which it is also referred to as intertextuality (Gee, 2010). Gee‟s approach fits much more squarely within the highly contextual style of interactional sociolinguistics. (p. 13)

Bakhtin‟s original formulation of the idea (Bakhtin, 1981). (p. 13)

We reconcile the two views with the image of a tapestry. The many varied colors of threads that are woven together in a tapestry are held together by a substrate of white threads that give the tapestry its shape. The colored threads give the tapestry its beauty, but without the white threads, the tapestry would have no structure. Thus, the two synergize perfectly. Neither could perform their function without the other. Nevertheless, the two types of thread may have different properties and thus the two conceptions of Heteroglossia may need to be computationalized differently. (p. 13)

Where does this leave us in terms of questions related to whether it is valid to discretize behavior into categories that can be precisely defined? We see here that there is not one answer to this question. It depends on what we want to know about how perspectives are treated within an interaction. (p. 14)

Moving Forward (p. 14)

In this chapter we have argued that in order to understand the culture shift that ideally should come with the introduction of training in discussion facilitation techniques designed to socialize intelligence, we need to develop instrumentation than can be used to detect subtle shifts in behavior that occur over long periods of time. (p. 14)

We have grappled with two questions that must be addressed in order to apply machine learning techniques effectively to build such instrumentation. First, we have explored what machine learning is capable of modeling in terms of interaction analysis. Second, we have considered the danger that the change as it happens is visible only within the part of the signal that we will lose when we scope down the analysis problem to precise operationalizations that can be modeled with a feasible amount of computational complexity. If we want to use machine learning as a tool, we need to focus our questions of the data in such a way that they can be answered with categories that have the potential to generalize as much as possible without losing the change we want to measure. (p. 14)

Our search for an effective computational model must balance the search for discretizations of the data that have predictive validity with respect to these external measures with a search of the space of possible representations of the data that would render those behavior categorizations as learnable using machine learning technology. (p. 15)

References (p. 16)

Ai, H., Sionti, M., Wang, Y. C., & Rosé, C. P. (2010, June). Finding transactive contributions in whole group classroom discussions. In Proceedings of the International Conference of the 9th International Conference of the Learning Sciences (ICLS), Volume 1 (pp. 976983). Chicago, IL: International Society of the Learning Sciences. (p. 16)

Dielmann, A., & Renals, S. (2008). Recognition of dialogue acts in multiparty meetings using a switching DBN. IEEE Transactions on Audio, Speech, and Language Processing, 16(7), 1303-1314. (p. 17)

Dönmez, P., Rosé, C. P., Stegmann, K., Weinberger, A., & Fischer, F. (2005). Supporting CSCL with automatic corpus analysis technology. In Proceedings of the 2005 conference on Computer Support for Collaborative Learning: Learning 2005: The next 10 years! (pp. 125-134). International Society of the Learning Sciences. (p. 17)

Erkens, G., & Janssen, J. (2008). Automatic coding of dialogue acts in collaboration protocols. International Journal of Computer-Supported Collaborative Learning, 3(4), 447-470. (p. 17)

Gee, J. P. (2010). An introduction to discourse analysis: Theory and method (3rd ed.). New York, NY: Routledge. (p. 17)

Gweon, G., Agarwal, P., Udani, M., Raj, B., & Rosé, C. P. (2011). The automatic assessment of knowledge integration processes in project teams. In Proceedings of Computer Supported Collaborative Learning (pp. 462-469). Hong Kong, China. (p. 18)

Howley, I., Mayfield, E., & Rosé, C. P. (in press). Linguistic analysis methods for studying small groups. In C. Hmelo-Silver, A. O‟Donnell, C. Chan, & C. Chin (Eds.), International handbook of collaborative learning. Abingdon, Oxford, UK: Taylor and Francis, Inc. (p. 18)

Kumar, R., Beuth, J., & Rosé, C. P. (2011). Conversational strategies that support idea generation productivity in groups. In Proceedings of the 9th International Computer Supported Collaborative Learning Conference, Volume 1: Long Papers (pp. 398-405). (p. 19)

Mayfield, E., & Rosé, C. P. (in press). LightSIDE: Open source machine learning for text accessible to non-experts. In M. D. Shermis & J. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions. Routledge Academic. (p. 19)

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135. (p. 20)

Blog Logo

Bodong Chen


Published

Image

Crisscross Landscapes

Bodong Chen, University of Minnesota

Back to Home