Bodong Chen

Crisscross Landscapes

Notes: National Academies of Sciences, & Medicine. (2018). Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report



Citekey: @National_Academies_of_Sciences2018-yf

National Academies of Sciences, & Medicine. (2018). Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report. Washington, DC: The National Academies Press.






The need to manage, analyze, and extract knowledge from data is pervasive across industry, government, and academia. (p. 1)

The nation’s ability to make use of these data depends on the availability of an educated workforce with necessary expertise. (p. 1)

new models for delivering education (p. 1)

Fueled by the explosion of data, jobs that involve data science have proliferated and an array of data science programs at the undergraduate and graduate levels have been established. (p. 1)

Future data science programs will need to incorporate a variety of skills. Strong analytic skills are needed to work with large, complex data sets. Oral and written communication skills are also necessary to engage with diverse audiences about realworld problems, to work in teams, and to participate in effective problem solving for both technical and ethical dilemmas encountered in uses of data science. (p. 1)

Using real data will expose students to the messiness they will confront when solving real-world problems. Selecting applications with broad impact will make instruction more compelling, helping to attract and retain students. (p. 1)

Critical curricular topics include mathematical foundations, computational thinking, statistical thinking, principles of effective data management, techniques for data description and curation, data modeling approaches, effective communication skills, reproducibility challenges and current best practices, exposure to ethical dilemmas and problem-solving skills, and a range of domain-specific topics. (p. 1)

“data acumen,” which enables data scientists to make good judgments and decisions with data. (p. 1)

data science is inherently concerned with understanding and addressing real-world problems and challenges (p. 1)

Specifically, this report lays out some of the information and comments that the committee has gathered and heard during the first half of its study, offers perspectives on the current state of data science education, and poses some questions that may shape the way data science education evolves in the future. (p. 2)

BOX S.2 Questions for Public Input The committee has identified the following themes, which are discussed throughout this interim report, and is soliciting input on the open questions. Please visit the following webpage to provide input: (p. 3)

Building Data Acumen in a Data Science Curriculum ● Which key components should be included in data science curriculum, both now and in the future? ● How could these components be prioritized or best conveyed for differing types of data science programs? ● How can opportunities to enhance data acumen (i.e., the ability to make good judgments and decisions with data) be integrated into data science educational programs? ● How can data acumen be measured or evaluated? (p. 3)

Oral and Written Communication Skills and Teamwork (p. 3)

Assessment and Evaluation (p. 4)

A new generation of tool developers and tool users will require the ability to make good judgments and decisions with data and use t (referred to as “ ools responsibly and effectively data acumen” throughout this report). (p. 5)

Data scientists have the potential to help address critical real-world challenges. (p. 6)

Current data science courses, programs, and degrees are highly variable in part because emerging educational approaches start from different institutional contexts, aim to reach students in different communities, address different challenges, and achieve different goals. (p. 7)

programs are taking a cross-disciplinary approach—for example, integrating statistics and computer science concepts into the undergraduate data science degree program (p. 7)

Chapter 3 explores the role of innovative curriculum development and provides some considerations for institutions. (p. 7)

K-12 objectives (p. 7)

A useful historical analogy can be made with the Lewis and Clark expedition (1804–1806), which laid the foundation for great discoveries about a newly expanded nation through systematic data collection and analysis. (p. 8)

2 Acquiring Data Science Skills and Knowledge (p. 10)

What skills are needed to be successful in the workplace and in society? Is data science a fundamental skill that all students should have some exposure to? How can data literacy be improved? In what skills, methods, and technologies should future data scientists be trained, given the wide variety of potential applications? (p. 10)

Finding 2.1: A critical component of data science education is to guide students to develop data acumen. This requires exposure to key concepts in data science, real-world data and problems that can reinforce the limitations of tools, and ethical considerations that permeate many applications. Key concepts related to developing data acumen include the following: ● Mathematical foundations, ● Computational thinking, ● Statistical thinking, ● Data management, ● Data description and curation, ● Data modeling, ● Ethical problem solving, ● Communication and reproducibility, and ● Domain-specific considerations. (p. 11)

3 Data Science Education in the Future (p. 19)


Although full degree-granting programs in data science may not be available yet in many settings outside top-tier academic institutions, graduates will still need to come from community colleges, minority-serving institutions, and smaller colleges and universities in order to fill the pipeline of data talent. (p. 19)

The process whereby the goals are achieved can be varied. In terms of delivering content, flipped courses, hybrid courses, independent studies, experiential learning, modular courses, hackathons,1 data dives,2 and just-in-time learning are all viable options for students (p. 20)

A first step in establishing a new curriculum is to consider relevant experiences from other disciplines (e.g., the digital humanities) that have recently emerged from a period of reorganization and innovation. (p. 20)

2 A “data dive” is an event in which organizations, often nonprofits, present a data-driven problem to a group with data science expertise to solve in a limited amount of time. (p. 20)

BOX 3.1 Examples of Current Data Science Programs (p. 21)

By this, they mean changing teaching from a lecturebased format to one that has both inquiry-based and modular-learning components and that treats students as scientists who “develop hypotheses, design and conduct experiments, collect and interpret data, and write about their results” (Handelsman et al., 2004). (p. 23)

Hoey (2008) suggests that the key measures include the following: ● Knowledge of concepts in the discipline; ● Ability to conduct independent research; ● Ability to use appropriate technologies; ● Ability to work with others, especially in teams; and ● Ability to teach others. (p. 23)

4 Broad Participation in Data Science (p. 26)

Data science programs have the potential to attract broad participation, including diverse members from different disciplines (including the humanities, social sciences, and the arts) and from populations that are underrepresented in other similar science, technology, engineering, and mathematics (STEM) fields (see Box 4.1). (p. 26)

● CS for All3 is a program that aims to provide all U.S. students the opportunity to participate in computer science and computational thinking education in their schools at the K–12 levels. (p. 26)

3 National Science Foundation, “CS for All,”, accessed August 21, 2017. (p. 26)

A “pipeline” metaphor has been a standard means to consider the flow of students through a STEM curriculum, with “leakage” used to indicate that some students step out of this path and potentially move to others. It has been argued that this metaphor should be replaced by a “watershed” in which there are multiple flow pathways by which students may enter a degree program dependent upon their own backgrounds. For inherently interdisciplinary degree programs with multiple potential routes for student success, such a metaphor structures a more open, collaborative approach toward building programs that attract diverse students than a fixed pipeline metaphor. (p. 27)

K-12 OBJECTIVES (p. 29)

Elementary, middle, and high schools play an important role in developing data science education and preparing students to thrive in a modern workforce. (p. 29)

Some of the practices called for in these standards include analyzing and interpreting data; using mathematics and computational thinking; and obtaining, evaluating, and communicating information. Embedded in these practices are such skills as being able to (1) identify significant features and patterns in data through tabulation, graphical interpretation, visualization, and statistical analysis; (2) make and test predictions through constructing simulations and recognizing, expressing, and applying quantitative relationships; and (3) communicate orally or in writing using tables, diagrams, graphs, and equations (NRC, 2012, pp. 49-53). Through the adoption of these national standards, data scientists may be positioned to play a role in curriculum development by working with curriculum designers to ensure alignment between the practices highlighted above and the requisite skills that are needed upon entry into data science programs. (p. 29)


In addition to efforts that could be achieved in formal educational spaces, there are outreach efforts to students in more informal spaces, including year-long afterschool programs, summer camps, high school internship programs, competitions, and websites designed to foster motivation and interest (p. 29)