Bodong Chen

Crisscross Landscapes

Notes: Designing for Productive Failure



Citekey: @kapur2012

Kapur, M., & Bielaczyc, K. (2012). Designing for Productive Failure. Journal of the Learning Sciences, 21(1), 45–83. doi:10.108010508406.2011.591717


the design principles undergirding productive failure (PF; M. Kapur, 2008). (p. 3)

(a) PF, in which students collaboratively solved complex problems on average speed without any instructional support or scaffolds up until a teacher-led consolidation; or (b) direct instruction (DI), in which the teacher provided strong instructional support, scaffolding, and feedback (p. 3)

Findings suggested that although PF students generated a diversity of linked representations and methods for solving the complex problems, they were ultimately unsuccessful in their problem-solving efforts. Yet despite seemingly failing in their problem-solving efforts, PF students significantly outperformed DI students on the well-structured and complex problems on the posttest. (p. 3)

greater representation flexibility (p. 3)

instructional structure is designed to constrain or reduce the degrees of freedom in problem-solving activities (Wood, Bruner, & Ross, 1976), thereby increasing the likelihood of novices achieving performance success. Indeed, a vast body of research supports the efficacy of such an approach. (p. 4)

the role of failure in learning and problem solving, much as it is intuitively compelling, remains largely underdetermined and underresearched by comparison (p. 4)

In contrast, the role of failure in learning and problem solving, much as it is intuitively compelling, remains largely underdetermined and underresearched by comparison (Clifford, 1984; Schmidt & Bjork, 1992). (p. 4)

our work is grounded in the belief that engaging novices to try, and even fail, at tasks that are beyond their skills and abilities can, under certain conditions, be productive for developing deeper understandings. (p. 4)


Several scholars and research programs have spoken to the role of failure in learning and problem solving (Clifford, 1984). (p. 4)

Schmidt and Bjork (1992) reviewed methods used in the training of motor and verbal skills. (p. 4)

the notion of “desirable difficulties” (p. 4)

research on impasse-driven learning (Van Lehn, Siler, Murray, Yamauchi, & Baggett, 2003) in coached problem-solving situations provides strong evidence for the role of failure in learning. Successful learning of a principle (e.g., a concept, a physical law) was associated with events when students reached an impasse during problem solving. (p. 5)

Impasse means a situation of deadlock.. Quite relevant to Promisingness .. Or the opposite side (p. 5)

impasse-driven learning (p. 5)

Van Lehn et al.’s (2003) findings suggest that it may well be more productive to delay that structure up until the student reaches an impasse—a form of failure—and is subsequently unable to generate an adequate way forward. (p. 5)

What is differnt for promisingness is it asks students themselves to find the way out, rather than merely allowing them to experience such impasse for later structure or direct instruction. This is very differnt. (p. 5)

Schwartz and Martin (2004). In a sequence of design experiments on the teaching of descriptive statistics to intellectually advanced students, Schwartz and Martin demonstrated an existence proof for the hidden efficacy of invention activities when such activities preceded DI (e.g., lectures), despite such activities failing to produce canonical conceptions and solutions during the invention phase. (p. 5)

Kapur’s (2008) work on PF adds further weight to the role of failure in learning and problem solving. (p. 5)

Kapur examined students solving complex, ill-structured problems without the provision of any external support structures. (p. 5)

Kapur (2008) argued that delaying the structure received by students from the ill-structured groups (who solved ill-structured problems collaboratively followed by well-structured problems individually) helped them discern how to structure an ill-structured problem, thereby facilitating a spontaneous transfer of problemsolving skills. (p. 6)

a growing body of research that emphasizes the need to understand conditions under which delaying structure during instruction can enhance learning (e.g., diSessa, Hammer, Sherin, & Kolpakowski, 1991; Lesh & Doerr, 2003; Slamecka & Graf, 1978). (p. 6)

These studies, however, indicate more than simply a delay of instructional structure. They also underscore the presence of desirable difficulties and productive learner activity in solving problems. (p. 6)

It is this interest in what is present, that is, the features of productive learner activity (even if it results in “failure”), that forms the core of our work. (p. 6)


The literature provides insight into why providing instructional structure too early in the problem-solving process can be problematic. (p. 6)

First, students often do not have the necessary prior knowledge differentiation to be able to discern and understand the affordances of domain-specific representations and methods given during DI (e.g., Schwartz & Martin, 2004; for a similar argument applied to perceptual learning, see Garner, 1974; Gibson & Gibson, 1955). Second, when concepts, representations, and methods are presented in a well-assembled, structured manner during DI, students may not understand why those concepts, representations, and methods are assembled in the way that they are (Anderson, 2000; Chi, Glaser, & Farr, 1988; diSessa et al., 1991; Schwartz & Bransford, 1998). (p. 6)

designing for PF requires engaging students in a learning design that embodies four core interdependent mechanisms: (a) activation and differentiation of prior knowledge in relation to the targeted concepts, (b) attention to critical conceptual features of the targeted concepts, © explanation and elaboration of these features, and d) organization and assembly of the critical conceptual features into the targeted concepts. (p. 7)

What does promisingness require? Probably a favorable epistemic belief that thinks ideas are improbable. Then an understanding of current state of art of community knowledge. And an activation when presented a promising idea.. Add more… (p. 7)

four core interdependent mechanisms (p. 7)

What mechanisms could I design for promisingness? (p. 7)

a design comprising two phases: a generation and exploration phase (Phase 1) followed by a consolidation phase (Phase 2). Phase 1 affords opportunity for students to generate and explore the affordances and constraints of multiple representations and solution methods (RSMs). Phase 2 affords opportunity for organizing and assembling the relevant student-generated RSMs into canonical RSMs. (p. 7)

The designs of both phases involved decisions concerning the creation of the activities, the participation structures, and the social surround (see Figure 1). These decisions were guided by the following core design principles to embody the aforementioned mechanisms: 1. Create problem-solving contexts that involve working on complex problems that challenge but do not frustrate, rely on prior mathematical resources, and admit multiple RSMs (mechanisms a and b); 2. Provide opportunities for explanation and elaboration (mechanisms b and c); and 3. Provide opportunities to compare and contrast the affordances and constraints of failed or suboptimal RSMs and the assembly of canonical RSMs (mechanisms b–d). (p. 7)

Social surround (p. 7)

activity (p. 7)

Participation structures (p. 7)

FIGURE 1 The three layers of the productive failure design. (p. 7)

These three levels might also inform my design.. (p. 7)

Phase 1: Generation and Exploration of RSMs The overall design goal of Phase 1 was to afford opportunities for students to generate and explore a wide variety of RSMs for solving novel, complex problems. (p. 8)

Designing the activity: “sweet-spot” calibration of complex problems. (p. 8)

Complexity of the problems (p. 8)

complex problem scenarios afford multiple RSMs and often require students to make and justify assumptions (Jonassen, 2000; Spiro, Feltovich, Jacobson, & Coulson, 1992; Voss, 1988). (p. 8)

Prior mathematical resources of students. (p. 8)

Affective draw of the problem scenario. (p. 9)

more engaged and interested in the problem when it was presented in the form of a narrative with dialogue. (p. 9)

a “comic strip” format would have more appeal (e.g., Kapur & Lee, 2009). (p. 9)

Designing the participation structures: enabling collaboration. (p. 9)

Because collaborative problem solving has been found to be an enabling mechanism that allows students to share, elaborate, critique, explain, and evaluate shared work (Chi et al., 1988; Scardamalia & Bereiter, 2003), smallgroup collaboration was used as the participation structure during Phase 1. (p. 9)

in our work, the grouping of students into small groups was not based simply on randomization but on leveraging teachers’ understandings of the social dynamics (p. 9)

to maximize the likelihood that group members would work well together in their assigned groups (E. G. Cohen, Lotan, Abram, Scarloss, & Schultz, 2002). (p. 10)

Designing the social surround: creating a safe space to explore. (p. 10)

As described, the role of the teacher was not to provide any cognitive or content-related support but mainly to manage the classroom and provide affective support as part of setting the appropriate expectations and norms for problem solving. (p. 10)

Phase 2: Consolidation and Knowledge Assembly (p. 10)

The overall design goal of Phase 2 was to afford opportunities for students to compare and contrast the affordances and constraints of failed or suboptimal RSMs and the assembly of canonical RSMs. (p. 10)

Designing the activity: examining student-generated and canonical RSMs. (p. 11)

Back to the issue of efficiency. Whether PF will take significantly more time than DI? (p. 11)

Designing the participation structures: enhancing engagement. (p. 11)

For teachers largely and self-admittedly accustomed to a DI mode, these facilitation strategies are not easily developed or adopted. Hence, a professional development program was carried out to develop the teachers’ facilitation skills and strategies. (p. 11)

Designing the social surround: creating a safe space to explore. (p. 11)

in PF, teachers set the expectations that the discussion of student-generated RSMs was not to assess them as correct or incorrect. Instead, the expectation set was that the process of coming up with RSMs is an important part of mathematical practice (Thomas & Brown, 2007) and that understanding why and under what conditions some RSMs are better than others is important for developing mathematical understanding (diSessa & Sherin, 2000). (p. 11)


The experiments were conducted with Grade 7 students at three mainstream, coeducational, public schools in Singapore. (p. 11)

With the PSLE Math grade and total score as the two dependent variables, a multivariate analysis of variance revealed a significant multivariate effect among the three schools, F(4, 594) = 437.82, p < .001. (p. 12)

Comparing PF With DI (p. 13)

describe these designs next and articulate our hypotheses for comparing them (p. 13)

This cycle of lecture, practice/homework, and feedback then repeated itself over the course of the same number of periods as in the PF condition. Thus, the amount of instructional time was held constant for the two conditions. (p. 13)

We hypothesized that the PF design would afford students greater opportunities to activate and differentiate their prior knowledge; attend to, explain, and elaborate upon the critical conceptual features of the concept of average speed; and (p. 13)

understand the assembly of these features into the canonical RSMs. Consequently, PF students would be able to construct deeper conceptual understanding of the concept of average speed compared to students in a DI design. A deeper conceptual understanding should result in better performance in solving problems on average speed on the posttest (p. 14)

n addition, we also expected that because PF students would have generated and explored a variety of RSMs, they would also demonstrate better representational flexibility in solving problems on average speed on the posttest (Ainsworth, Bibby, & Wood, 2002; Lesh, 1999). (p. 14)

A quasi-experimental, pre/post design was used in all three schools. (p. 14)

Data Sources and Analytical Procedures (p. 16)

Process measures for the PF condition. (p. 16)

The group work artifacts were examined to determine the number of PF groups that were able to solve to the complex problems successfully. (p. 16)

Group/individual performance. (p. 16)

Group RSM diversity. (p. 16)

The set of RSMs identified in the group work artifacts was used to chunk the group discussion into smaller episodes. (p. 16)

clear transitions in the discussions when a group moved from one RSM (e.g., ratios, trial and error) to another (e.g., algebra). (p. 16)

A total of nine different RSMs emerged from this analysis. RSM diversity was defined as the number of different RSMs generated by a group. (p. 17)

Process measures for the DI condition. (p. 20)

homework assignment provided a proxy measure for student performance in the DI condition. (p. 20)

The 5-item posttest comprised three well-structured problem items similar to those on the pretest, one complex problem item, and one graphical representation item (see Appendix C for an example of each). (p. 20)

Outcome measures for the PF and DI conditions. (p. 20)

The three types of items formed the three dependent variables in a multivariate analysis of covariance (MANCOVA), with pretest score as the covariate. (p. 20)

These debriefing sessions were captured in audio and transcribed. Data from these sessions are used only as corroborating evidence to support the discussion of our findings. (p. 20)

RESULTS (p. 21)

Pretest (p. 21)

no significant difference between the PF and DI classes (p. 21)

Process (p. 21)

With regard to RSM diversity, findings suggest that PF groups in all three schools were able to generate multiple RSMs for solving complex problems (p. 21)

An analysis of covariance revealed a significant difference among schools on RSM diversity, F(2, 181) = 3.51, p = .032, partial η2 = .04. (p. 21)

With regard to group and individual performance on the complex problems, findings suggest that in spite of generating multiple RSMs, students were ultimately unable to solve the problems successfully either in groups or individually. (p. 21)

Posttest (p. 22)

The interaction between prior knowledge (covariate) and experimental condition (PF vs. DI) was not significant (p. 22)

MANCOVA (p. 23)

The multivariate main effect of experimental condition was significant in all three schools. (p. 23)

This analysis also suggested a variance within the PF condition (especially in Schools B and C), which forms the next focus of our investigation (p. 24)


we focus on unpacking variation in the generation and exploration phase (p. 24)

First we examine whether the diversity of RSMs generated by each group relates to the subsequent posttest performance by members of that group. Based on this relationship, we begin to unpack the actual interactions occurring among group members in generating such RSMs. (p. 24)

an examination of the relationship between group RSM diversity and learning outcomes as measured on the posttest across all PF groups (p. 24)

More specifically, a comparison of the effect sizes suggests that the effect of RSM diversity was about 9 times stronger than the pretest and 13 times stronger than the school. (p. 25)

Contrasting-Case Analysis (p. 25)

the role of a collaborative activity structure (p. 25)

The purpose of the following contrasting-case analysis is to use discussion excerpts from two groups with contrasting levels of RSM diversity and illustrate how these groups additionally differed in their exploration of the RSMs they generated and how this difference in exploration potentially influenced opportunities to attend to critical features of the problem. (p. 25)

Two groups, one with high diversity (hereinafter referred to as Group HD) and another with low diversity (hereinafter referred to as Group LD) from School A that contrasted in their RSM diversity, were selected. (p. 25)

One thing I wish to know is whether these two groups performed differently in pre-test. If so, it may weaken the authors’ interpretation of the effect of variation of PF. (p. 25)

Selection of contrasting-case groups. (p. 25)

Our strategy is to present the contrasting excerpts2 from the two groups followed by an analysis of the contrast. Each excerpt is also accompanied with interpretive comments for each utterance, the mechanisms (a, b, or c) invoked, and the collaboration moves (e.g., proposal, question, evaluation, explanation) made. (p. 26)

Exploring an RSM. (p. 26)

However, the two groups seemed to be quite different in terms of the mechanisms invoked, which influenced their understanding of the affordances and constraints of the method (p. 26)

Because Group LD’s discussion seemed to be focused mainly on computational features of the guess and check method, there was little evidence that members attended to the critical features of the targeted concept; that is, mechanism b was rarely invoked. In contrast, Group HD’s discussion was at a more conceptual level; that is, the group worked out the ratios of the walking and biking speeds and seemed to have realized that the ratio method does not work when the walking and biking speed ratios are different. (p. 31)

In Group HD, solution proposals were met with questions, clarification and agreement, followed by evaluation and more questions, leading to explanations and evaluation and then more explanation until shared understanding was established (Utterances 4–13). (p. 31)

In contrast, the collaborative pattern in Group LD was mainly one of solution proposal, followed by question, explanation and computation, with disagreements or alternative viewpoints not being taken up substantively for discussion (p. 31)

Summary. (p. 31)

First, the quantitative analysis of RSM diversity in PF groups shows that the greater the number of RSMs generated by a group, the better the posttest performance of the group members (mechanism a). (p. 31)

Second, the qualitative contrasting-case analysis serves to illustrate how two groups that (p. 31)

differed in the number of RSMs they generated additionally seemed to differ in their collaborative understanding of the problem and exploration of the RSMs and how this difference in exploration potentially influenced opportunities to attend to, explain, and elaborate upon the critical features of the concept of average speed (mechanisms b and c). (p. 32)


First, we found that compared to DI, PF seems to engender deeper conceptual understanding without compromising performance on well-structured problems. (p. 33)

Second, although we found a significant difference among schools in terms of their students’ ability to generate RSMs for solving the novel, complex problems, this difference among the schools had a notably smaller effect size (η2 = .04) than preexisting differences in general ability (η2 = .85) and mathematical ability (η2 = .44) (p. 33)

Third, we found that RSM diversity was correlated with learning outcomes (p. 33)

Explaining PF (p. 33)


conditions that maximize performance in the shorter term may not necessarily be the ones that maximize learning in the longer term (Clifford, 1984; Schmidt & Bjork, 1992). (p. 36)

Four possibilities for design emerge. First is the possibility of designing conditions that maximize performance in the shorter term and that also maximize learning in the longer term. Let us call such design efforts designing for productive success. (p. 36)

However, there is also the concomitant possibility of designing conditions that may well not maximize performance in the shorter term but in fact maximize learning in the longer term. Let us call such design efforts designing for productive failure. (p. 36)

unproductive success—an illusion of performance without learning—as well as unproductive failure. (p. 36)


Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 18(1), 32–41. (p. 37)

Clifford, M. M. (1984). Thoughts on a theory of constructive failure. Educational Psychologist, 19(2), 108–120. (p. 37)

Kapur, M. (2008). Productive failure. Cognition and Instruction, 26, 379–424. (p. 38)

Kapur, M. (2009). Productive failure in mathematical problem solving. Instructional Science, 38, 523–550. doi:10.1007/s11251-009-9093-x (p. 38)

Kapur, M. (2010). A further study of productive failure in mathematical problem solving: Unpacking the design components. Instructional Science, 39, 561–579. doi:10.1007/s11251-010-9144-3 (p. 38)

Lobato, J. (2003). How design experiments can inform a rethinking of transfer and vice versa. Educational Researcher, 32(1), 17–20. (p. 38)

Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217. (p. 39)

Schwartz, D. L., & Martin, T. (2004). Inventing to prepare for future learning: The hidden efficiency of encouraging original student production in statistics instruction. Cognition and Instruction, 22, 129–184. (p. 39)

Spiro, R. J., Feltovich, R. P., Jacobson, M. J., & Coulson, R. L. (1992). Cognitive flexibility, constructivism, and hypertext. In T. M. Duffy & D. H. Jonassen (Eds.), Constructivism and the technology of instruction: A conversation (pp. 57–76). Hillsdale, NJ: Erlbaum. (p. 39)

Tobias, S., & Duffy, T. M. (2010). Constructivist instruction: Success or failure. New York, NY: Routledge. (p. 39)

Van Lehn, K., Siler, S., Murray, C., Yamauchi, T., & Baggett, W. B. (2003). Why do only some events cause learning during human tutoring? Cognition and Instruction, 21, 209–249. (p. 39)