Citekey: @schwartz13measuring

Schwartz, D. L., & Arena, D. (2013). Measuring What Matters Most: Choice-Based Assessments for the Digital Age. Cambridge, Massachusetts; London, England: The MIT Press.



Erna Yackel and Paul Cobb (1996, 473) nicely summarize the perspective of many: “The development of intellectual and social autonomy is a major goal in the current educational reform movement, more generally, and in the reform movement in mathematics education, in particular. In this regard, the reform is in agreement with Piaget that the main purpose of education is autonomy.” (p. 28)

Table 3.1 provides a sampling of opposite positions. The left side shows the terms adopted by reform-minded educators, and the right side displays the terms they use to label the alternatives. We assume that the preference for greater choice reflects a desire for students to develop agency. Choice is what creates the possibility of agency, and therefore many educational researchers prefer high-choice environments for learning. (p. 29)

One possible consequence of a museum experience is that it makes the content of the exhibits “sticky.” The museum experience might create a small interest that helps future related information stick to that interest. (p. 78)

To find out with a choice-based assessment, the visitors could receive a URL to play a game that has a lot of computer characters talking simultaneously, much like a party. At a select time, one of the characters among all the other voices could say a sentence that included the words “cosmic rays.” Would the brains of the people who had gone to the exhibit “choose” to hear the sentence more than people who had not gone to the exhibit? (The analog is when people somehow hear their name at a party, even though all the other words in the crowd were just an unattended sound track until their name was spoken.) If people did hear the character say “cosmic rays,” would they then choose to engage this character in the game? (p. 79)

7 Standards for Twenty-First-Century Learning Choices (p. 81)

In education, the decision about what to assess is largely driven by content standards. Standards create the possibility of accountability. The standards adopted by an educational system influence the textbooks written for that system, the daily instruction, and the assessments. Because of their importance, standards are often developed by high-powered committees. This can involve a good deal of negotiation about what is valuable for an educated citizen. A concern with many standards is that negotiations yield laundry lists. An analogy comes from college English departments (and others, of course). All faculty members believe that their particular area of expertise should be considered mandatory for the students, whether or not it fits with everything else being offered. This produces something that is less coherent than one might want. It leads to the risk of a mile-wide, inchdeep curriculum. Designers of assessments have to chase standards, coherent or not. If standards do not have a set of overarching principles, the designers of assessments also do not have a set of principles for translating the standards into assessments. This can lead to unhappy results. (p. 81)

One approach is to develop a model of what it means to be an educated person. Rather than a laundry list of competencies, the target of instruction and assessment would be more holistic. There is a common joke about developmental psychology textbooks: each chapter decomposes an aspect of child development to the point where there are no actual children in the textbook. It is an analog of the blind people each feeling a different part of an elephant and never being able to grasp the whole of the beast. A frame for standards could present a holistic model that helps keep in mind the recipients of education when designing for the many discrete competencies. (p. 84)

Giyoo Hatano and Kayoko Inagaki (1986) distinguished two types of expertise that are relevant to choice-based assessments: routine expertise and adaptive expertise. For recurrent situations of low variability, people can develop routine expertise: a set of rapid, consistent, and error-free routines. (p. 84)

The dependence of these experts on routines led Hatano and his colleagues to propose adaptive expertise, which differs from the routine expertise of the abacus masters. Adaptive expertise is more appropriate for situations of high variability. Rather than replicating efficient routines, adaptive experts vary their behaviors and understanding in response to a changing environment. Hatano and Inagaki enumerated some of the characteristics of adaptive expertise that differentiate it from routine expertise: the ability to verbalize the principles underlying one’s skills; the ability to judge conventional and unconventional versions of skills as appropriate; and the ability to modify or invent skills according to local constraints. Others have added to this list (e.g., deliberate practice, Ericsson, Krampe, and Tesch-Römer 1993; prospective adaptation, Martin and Schwartz 2009). (p. 85)

Schwartz, Bransford, and Sears (2005) built on Hatano’s work to create the simple learning framework in figure 7.2. (p. 85)

The horizontal dimension emphasizes efficiency at specific tasks. Efficiency is especially important for recurrent situations, where it is better to rapidly and accurately remember as well as apply a solution, instead of figuring it out over and over again. The trajectory along this dimension leads to routine expertise, which comprises a set of efficient skills and retrieval patterns that are high on accuracy and speed with low variability in execution. (p. 86)

The vertical dimension in figure 7.2 stresses innovation experiences that involve handling novelty and variation. If students will have to adapt to a changing future, then they will need experiences that prepare them to invent new ways of understanding and doing things. Training the mastery of a given skill will not be sufficient. People need to be prepared to adapt—that is, learn something new. (p. 87)

A proposal embodied in figure 7.2 is that simply having students engage in innovative experiences is not sufficient to put them on a trajectory to adaptive expertise. Innovation without a strong body of understanding and efficient skills leads to inappropriate invention. (p. 87)

The optimal adaptability corridor (OAC) illustrates the goal of integrating experiences that support both efficiency and innovation. Many discussions in education pit these against each other—for example, discovery learning versus training (Tobias and Duffy 2009). This is a mistake, because there are different processes associated with efficiency and innovation, and therefore they do not displace one another. People need a balance of both. (p. 87)

daptive expertise seems highly relevant to recent discussions about the skills and competencies needed for the twenty-first century. Proposals range from increased creativity to improved social skills as well as many others. The catalyst for generating lists of twenty-first-century skills comes from a realization that times have changed and will continue to do so. The lists are responsive to a vision of a future filled with rapid changes in work, communication, global interdependence, technology, and ideally learning. In this future, individuals’ abilities to adapt to changes along with their abilities to innovate those rapid changes will largely be a function of their abilities to make effective learning choices. (p. 91)

Lists of twenty-first-century skills often focus on innovation (the vertical dimension from figure 7.2), which has been lacking in many standards. These skills have not been well operationalized into assessments, however, in part because of the hegemony of knowledge-based assessments. Choice-based assessments suit the gist of twenty-first-century skills because they embody an inherently dynamic perspective that matches the realization that people will need to continue to learn and adapt. (p. 91)

The list of twenty-first-century skills and competencies needs to be actionable if it is meant to do more than sort students (p. 91)

Oftentimes, there are confusions about which of the competencies can be taught. For example, many people propose that creativity is an innate personality trait rather than a learnable skill (e.g., Barron and Harrington 1981; Gough 1979). This is the wrong way to think about these competencies. The question instead is whether people choose to engage in them. In his review of creativity, Robert Sternberg (2006, 97) states, “Creativity is in large part a decision that anyone can make but that few people actually do make because they find the costs to be too high.” From this perspective, people can learn to make the choice to be creative, assuming there are environments that support this choice. (p. 92)

The Nobel physicist Leon Lederman (with Teresi 1993) provides a useful analogy for understanding the challenge of finding invisible constructs—in his case for subnuclear particles. He describes aliens (the Twilo) who come to earth and happen on a soccer match. They cannot see the colors black and white, which means they cannot see the soccer ball. They can see the players running, the goalie falling to the ground, and the crowd cheering. Their task is to figure out what organizes all these behaviors—that is, they need to infer the existence of a ball. It takes an inspired leap to posit an invisible construct. Moreover, once one posits the construct, it is necessary to decide which evidence is relevant and could confirm or falsify the existence of the construct. (p. 103)

The task of test makers is not that different. A person, researcher, or group of people may propose a construct such as “scientific identification.” Because it is a new idea, the purported properties of the construct are poorly understood; it is not clear who has it or how much; and the measures that would reveal its existence are also unknown. It is a hard endeavor. (p. 103)

Reliability is important. Through an unlucky coincidence (or maybe not a coincidence), however, the methodological demand of reliability coincides with a tendency of people to take an essentialist perspective that reifies assessments into stable traits or essences of a person—individual properties that do not change. Ray McDermott (1993) ironically describes how disability constructs “acquire” children, which in turn defines the children going forward, both inwardly and outwardly. Reified individual properties can range from disabilities to mastered knowledge to personality types to intelligence. The combination of the need for a stable assessment and the simplicity of thinking in terms of stable traits can yield useful scientific advances. (p. 106)

Oftentimes, the psychometrics of assessment bogs down in proof to the detriment of improvement. In an invited paper written to his peers via the Journal of Educational and Behavioral Statistics, Wainer (2010, 12–13) declared, “The psychometrics of today is both more extensive and better than we need. . . . If we want to improve the practice of testing, there is much more bang for the buck to be had in improving tests than improving test theory.” (p. 117)

Test evaluation frequently depends on inferential statistics such as t-tests and F-tests, which are designed to (dis)prove a hypothesis. The strict inferential statistics for establishing constructs, reliability, and validity can choke innovation. If we let go of inferential statistics and dreams of proof, we can embrace a new set of data-mining tools for handling behavioral data. These tools are exploratory and meant to aid human induction rather than (dis)prove hypotheses. They detect patterns within large data sets, and then it is up to subsequent research to determine if these patterns are valid and reliable. Choice-based assessments are prime candidates for data mining, because they collect large amounts of data by recording each click a student makes while learning. (p. 117)

Data mining provides a new set of tools for handling complex behaviors. Portions of the psychometric community have embraced the challenge of handling rich data. Witness the enthusiastic preface to Automated Scoring of Complex Tasks in Computer-Based Testing (Williamson, Mislevy, and Bejar 2006, 2): (p. 118)

The technological advances of the past decade are a catalyst for change in educational measurement. They allow increased flexibility, complexity, interactivity and realism of computer-administered assessment tasks, including multimedia components. Coupled with capabilities for internet delivery and its implications for large-scale on-demand administration, the potential wealth of data that can be tracked and recorded from such administrations appears capable of revolutionizing assessment. Such a revolution relies, in part, on the promise of a standardized automated analytical approach to measuring previously elusive constructs and complex problem-solving behavior. Of course, this promise depends on the ability of the measurement profession to address new challenges in the practice of educational measurement posed by such an approach to assessment. (p. 119)

The students’ choices of whether and how to use the resources along with their accuracy on embedded questions are fed into machine-learning algorithms to produce student problem-solving models. Student performances can be characterized in terms of strategies such as guessing (quick, incorrect results with little resource use), perseverating (combing through resources without achieving correct results), plodding (the inefficient use of resources leading to correct results), and expert performance (using only the most useful resources to achieve correct results). The IMMEX Project provides a strong illustration of how it is possible to capture and catalog learning choices in an open environment. (p. 121)

The ability to identify informative patterns of choice will depend on advancements in data mining, which are proliferating quickly. One new technique (Li and Biswas 2002) looks for hidden Markov models (HMM; Rabiner 1989). Automated HMM analysis finds recurrent patterns of choices. (p. 121)

Decide what to do once an assessment has detected that a student is making poor choices. How do we help students make correct choices and then see if they learn from that instruction? Deciding the best actions we can take to help students learn to make good choices is beyond the purview of this book, but if we want an assessment to yield actionable information, then the assessment should help students, educators, parents, or even a well-programmed computer consider candidate actions. Moreover, assessments that drive instructional actions (p. 126)

If an assessment helps improve learning, then we know the assessment is measuring something useful and in a useful way. (p. 144)

If an assessment supplies feedback that a student is making poor choices, and it also includes successful provisions for improving those choices, then it gains a tremendous amount of credibility. (p. 144)

In a computer learning environment, an ideal choice-based assessment would adapt to the choices the students make, so it would support their abilities to make better choices. We will call this a choice-adaptive learning environment. The term adaptive, as used here, does not refer to current versions of computer-adaptive testing. In computer-adaptive testing, the term adaptive indicates the ability of the computer to efficiently hone in on a student’s level of knowledge by constantly recalibrating question difficulty based on the student’s performance so far. Computer-adaptive testing is about shortening overall test-taking time. A choice-adaptive environment instead adapts to students’ choices to help guide them to better ones. Instruction and assessment would be seamlessly coupled, providing important guidance to learners, educators, and policymakers. (p. 144)

Assessment is inextricably linked to questions of fairness. Three outstanding issues are the content of the assessment, the use of the results, and respect for the persons being assessed. Regarding the content of an assessment, educational assessment entails a commitment to elevating some aspects of experience and individuals over others. This in turn raises questions of what measures are fair to include or exclude. As a concrete example, one of us (Schwartz) taught in a remote little Alaskan village for many years (remote, that is, from the vantage point of city dwellers). The village did not have radio or television; it was also five hundred air miles from the nearest road. One year, the students received a reading test that used the word curb. The village had no curbs, and most of the students had never seen one. On that item, they surely did poorly compared to city dwellers. This example fits into a larger discussion about whether assessments should take into account students’ opportunity to learn (Moss et al. 2008) or should be treated more like driving tests, where the public only cares whether a person is competent to a standard. (p. 149)

Knowledge is an engine of human performance, but it is still up to the driver to choose where to steer the car. With choices, we are telling people where they should steer their cars. Choice-based assessments go beyond measuring enablement to measuring action itself. This potentially intrudes on personal freedom, because people should be allowed to steer their own car. Of course, knowledge-based assessments also steer the car (as we will discuss below). But choice makes the intrusion more salient. This elevation of ethical concern is notable, since it shows that choices are closer to what people care about and therefore what we should be assessing. (p. 151)

Should a treatment of the United States, say, include the Anglo-Saxon version of America, the Native American version, the African-American version, or perhaps the Soviet one? Notably, these types of issues do not show up much in science or mathematics curricula (e.g., Should we teach intelligent design?). With the introduction of choice as an outcome, educators in the science, technology, engineering, and mathematics disciplines would no longer be falsely protected from these normative questions by believing that whatever they teach is safely objective. If they were teaching choices, then they could not ignore the socially constructed reality of educational outcomes. They would have to consider whether asking students to develop interest and persistence has equal standing to learning the objectively known quadratic equation. (p. 153)

How might we decide what choices to emphasize while minimizing the potential for unfairness? It is important to remember that the choices we care about are learning-relevant ones; not all choices are diagnostic from the perspective of learning. A useful first step is to take an empirical approach. Here, the goal would be to determine which choices have the largest influence on learning. (p. 153)

Providing students with the belief they are making a choice does not skirt the paradox because the outcomes of those choices and how those outcomes shape future choices are still enforced by authority. In educational settings, the consequences of making a given choice are largely orchestrated by that setting. This is how educational settings can create a protected learning environment: by controlling the consequences attached to different choices. This orchestration also makes it so that some choices are better than others. (p. 155)

B. F. Skinner (1986), a US behaviorist, famously proposed that education should unambiguously shape student behavior by establishing clear reinforcement contingencies for those things that need to be taught. School is not life, he contended. After school, students can choose—or more properly, believe they choose—whatever actions they would like. Skinner’s arguments are compelling, although the science behind them has been superseded by new findings and theories. But at a higher level, Skinner missed something fundamental. The narrative of choice and agency is basic to contemporary discourse and people’s selfconception (Bandura 1989). Whether or not the narrative is correct, agency is a socially constructed reality, and our civil society depends on it. Students need to experience and reflect on the agency of choice, even if they are guided toward some choices over others. So whether or not one likes the idea of providing “faux” choices for students, it is important for them to experience choice making. In Callan’s paradox, we would be satisfied if children had a chance to entertain the possibility of not being Catholic, even if that was not really an option until they were older. If the goal is to prepare people to choose, then the existence of choices needs to suffuse learning. (p. 156)

People have a natural tendency toward wishful thinking. Wishful thinking about assessments—that they are impartial measures, much like a yardstick is an impartial measure of height—lets people avoid the clutter of ethical questions and treat assessments as objective measures. But the truth is that an educational assessment is not a yardstick. It does not simply measure a learning outcome. Assessment elevates some aspects of experience over others, and it actively shapes what people consider important. Assessments do not merely test reality; they also create it. (p. 160)

For many, assessments are a lighthouse in the fog of education—a clear guide by which to make safe decisions. But in reality, assessments create the fog. Current assessments perpetuate beliefs that the proper outcomes of learning are static facts and routine skills—stuff that is easy to score as right or wrong. Interest, curiosity, identification, self-efficacy, belonging, and all the other goals of informal learning cannot even sit at the assessment table, because these outcomes are too far removed from current beliefs about what is really important. Assessments seem to be built on the presupposition that people will never need to learn anything new after the test, because current assessments miss so many aspects of what it means to be prepared for future learning. These frozen-moment assessments have influenced what people think counts as useful learning, which then shows up in curricula, standards, instructional technologies, and people’s pursuits. If the fog were lifted, we would see that most of the stakeholders in education care first and foremost about people’s abilities to make good choices. Making good choices depends on what people know, but it also depends on much more, including interest, persistence, and a host of twenty-first-century soft skills that are critical to learning. Where we can anticipate a stable future—decoding letters into words is likely to be a stable demand for the next fifty years—then knowledgeand skillbased assessments make sense. In relation to those aspects of the future that are less stable, though, people will need to choose whether, what, when, and how to learn. Hence, it is important (p. 164)

to focus on choices that influence learning, and assessments should measure those choices. Choice is the critical outcome of learning, not knowledge. Knowledge is an enabler; choice is the outcome. (p. 165)

Assessing choices during learning has a number of attractive properties. Foremost, choice-based assessments are process oriented. They examine learning choices in action rather than only the end products. This process focus makes it possible to connect the learning behaviors during the assessment to processes that occur in a learning environment. Second, the assessments reveal what students are prepared to learn, so they are prospective as opposed to retrospective. Third, choice resonates with the rest of the social sciences that examine the movements of people, money, and ideas. Fourth, choices do not lend themselves to simplistic reifications whereby things like people’s knowledge or personality traits are misinterpreted as independent of context and immune to change. Fifth, choices can measure a much greater range of learning outcomes than fact retrieval and procedural application. We have demonstrated several, including persistence after failure, critical thinking, attending to some ideas over others, creating a general solution, creative design, reading to learn, use of help, inductive strategies, and the uptake of feedback. There are many more to be had. Sixth, learning choices are a good candidate for inclusion in standards, which currently define what knowledge students should have but stay strangely silent about the processes of learning themselves. (p. 165)

Blog Logo

Bodong Chen



Crisscross Landscapes

Bodong Chen, University of Minnesota

Back to Home