Authentic Assessment and Instruction


Patricia G. Avery

Unless you’ve been sleeping through the past 10 years of educational reform, you are familiar with the term “authentic assessment.” Chances are you’ve read about it, attended staff development workshops on it, or that you work in a school district or state that has embedded authentic assessments into its graduation requirements.

Despite the increasing use of authentic assessments in K-12 classrooms, very little empirical research has explored the nature of authentic, performance-based assessments, or their relationship to instruction and student learning.1 In this study, we examine how student demographics, student engagement and teacher instruction influenced student performance on one authentic assessment task. Much of the design and implementation of the study is based on the work of Fred Newmann and his associates at the University of Wisconsin.2 The following is a brief description of their work.


Authentic Achievement

Fred Newmann and his associates at the University of Wisconsin propose an integrated conception of authentic intellectual achievement based on three criteria: student construction of knowledge through disciplined inquiry that has value beyond the classroom. Students construct knowledge when they synthesize, evaluate, or analyze data in ways that require more than mere memorization or replication. They engage in disciplined inquiry when they use methods and skills similar to those of the academician or professional, such as the ethnographic methods used by anthropologists or the search for verification and triangulation by historians. And finally, instruction and assessment are more meaningful to students when they reflect or simulate problems, issues or situations one might encounter in the world outside the classroom.

These principles of authenticity are not new. Versions of the same themes are found in the writings and work of Maria Montessori and John Dewey, in the Eight Year Study of the 1930s, and in federally funded curricula of the 1960s (e.g., the “New Math,” the “New Social Studies”), among others. What is particularly powerful, however, is the integrated conceptual framework put forth by Newmann, together with a strong research base.

According to Newmann, the criteria for authenticity may be reflected in three areas: assessment tasks, instruction, and student performance. Within a given area, there are different but parallel indicators of authenticity. For example, as shown in Table 1, an authentic assessment task requires student construction of knowledge, which is gauged by the degree to which the task requires students to organize information and consider alternatives.

During instruction, however, one indicator of student construction of knowledge is the degree to which students are engaged in higher order thinking. And lastly, the quality of analytical skills students demonstrate in their performance or work reflects their ability to construct meaningful knowledge.

As part of an extensive study of authentic intellectual achievement in restructured elementary, middle, and high schools, Newmann and his associates developed standards and scoring criteria for each area of authentic intellectual achievement. Researchers observed math and social studies classes at each school, examined the assessments teachers used in these classes, and evaluated samples of student work. Instruction, assessment tasks, and samples of student work were scored according to level of authenticity. In most classrooms, regardless of grade level or subject area, researchers found low levels of authentic pedagogy (instruction and assessment); the researchers also found authentic pedagogy to be a strong predictor of authentic student performance. That is, the more authentic the assessment task and the instruction (pedagogy) according to Newmann’s criteria, the more likely it is that students will produce high quality, authentic work. Put another way, if the task is meaningful, requires higher order thinking and disciplined inquiry, and the instruction reflects these characteristics, students have the opportunity to demonstrate authentic performance. If the task requires low level thinking, and instruction focuses on basic skills, students do not have the opportunity to produce high quality work. In Newmann’s study, assessment and instruction accounted for 35 percent of the between-classroom differences in students’ authentic academic performance.

The design of Newmann’s study did not permit a strong test of the relationship between student performance, instruction, and assessment because the assessment tasks and the curriculum varied by teacher. Students who were given low-level assessment tasks did not have an opportunity to exhibit high performance. Suppose, however, that students from different U.S. history classrooms were given the same high quality, authentic assessment task? Because all students would have an opportunity to demonstrate a high level of performance, and all students would be exposed to the same curriculum, we could focus our attention on the relationship between student performance and instruction. This is precisely the aim of the present study. The questions it asked are: What is the relationship among student demographic variables, student engagement, authentic instruction and authentic student performance? To what degree do student demographics, student engagement, and level of authentic instruction account for differences in student performance?



In February 1998, five U.S. history teachers in one urban high school used the same authentic assessment task to conclude a four-week unit on immigration. The school, teachers and students are described in Table 2. The authentic assessment task was the culminating project for the unit.


The Immigration Assessment Task

The assessment task is composed of a multi-layered series of performances.3 The “final product” is an essay that requires students to synthesize their findings and evaluate their significance. Initially, students are required to collect information on their family’s migration to the state—why they moved, the difficulties they encountered, and the move’s effect on family and cultural traditions. Students share this information in class, organize the information on charts, and then, with additional data from other classes, search for patterns. For example, what are the main reasons families migrate? Do some groups encounter more difficulties than do others? Which cultural traditions are maintained, which are adapted, and which are relinquished? The students then compare their findings to the patterns described in their textbook. Why are some class patterns not reflected in the textbook? Why are some immigration trends described in the textbook not found in the class data? The final essay requires students to summarize and speculate on the significance of their findings.

We believe this task reflects a fairly high level of authenticity in terms of Newmann’s criteria. For example, successful completion of the task requires students to organize and synthesize original class data on immigration (student construction of knowledge), to consider different perspectives on the immigrant experience (student construction of knowledge), to compare class and textbook data (disciplinary content and processes), and to address the significance of an issue that is part of their everyday lives (value beyond the classroom). Although students’ conclusions must be supported with evidence, there is no single “right answer” to the task.


Dependent and Independent Variables

The dependent variable in this study is authentic student performance; the independent variables include sex, race/ethnicity, socioeconomic status, student engagement, and authentic instruction. Descriptions and measurement or coding of each variable are presented in Table 3.



A year prior to our study, one of the teachers used an earlier draft of the Immigration Assessment Task in her class; because of her enthusiasm for the task, the other four U.S. history teachers decided to incorporate the task into their units on immigration the following year. The teacher experienced with the task met with the other teachers several times before February 1998 to discuss how she approached the task and instruction. Throughout February, all of the teachers shared ideas and resources.

Two observers visited each teacher’s class twice during the month, and rated instruction according to Newmann’s criteria for authentic instruction. The five-member rating team rotated in terms of pairings so that each observer worked with the other four raters at some point during the study. Samples of the criteria used for rating are as follows: for the teaching of higher order thinking, a maximum score of 5 was awarded if almost all students, almost all of the time, were performing higher order thinking; at the other end of the scale, a score of 1 was awarded if students were engaged only in lower order thinking operations. In terms of the provision of deep knowledge, a score of 5 was given if during the lesson almost all students did at least one of the
following: “sustained a focus on a significant topic; or demonstrated their understanding of the problematic nature of information and/or ideas; or demonstrated complex understanding by arriving at a reasoned, supported conclusion; or explained how they solved a complex problem.” Conversely, a score of 1 was awarded if “knowledge is very thin because it does not deal with significant topics or ideas; teachers and students are involved in the coverage of simple information which they are to remember.”

At the end of the month, students completed a survey of demographic information and a measure of student engagement. The essays the students wrote as part of the assessment task were typed, and names were replaced with identification numbers. Two raters scored the student work according to Newmann’s criteria for authentic student performance.



Table 4 addresses our first question regarding relationships among variables. The strongest correlation in the table is between authentic instruction and authentic student performance (r = .686). Simply put, when instruction is more authentic, student performance tends to be more authentic. This is a much stronger relationship than any other shown in the table. A more modest relationship exists between authentic student performance and student engagement (r = .369). When student engagement is high, the authenticity of student performance tends to be high. With the exception of the relationship between socioeconomic status and ethnicity, all other relationships in the table are low or negligible.

The performance scores of students in the authentic task were also compared with their scores on a more traditional 10-item multiple choice test dealing with immigration. As might be expected, there is a statistically significant (p = .01) positive correlation between performance on the two tests. Because the magnitude of the correlation (r = .235) is small, however, it seems clear that the authentic assessment task taps dimensions of student understanding and performance that are not tapped by traditional multiple choice tests.

Our second question examines the impact of the independent variables on authentic student performance. To address this question, we regressed students’ authentic performance scores on student demographics, student engagement, and authentic instruction (see Table 5).

Together, the independent variables account for 54 percent of the differences in students’ performance scores. This is a fairly high percentage of explained variance. The last column in Table 5 indicates the percentage of explained variance attributed to each variable. The demographic variables have very little impact on authentic student performance. Student engagement accounts for 7 percent of the explained variance. It is the authenticity of instruction, however, that is the best predictor of authentic student performance. Authenticity of instruction accounts for 40 percent of the difference in students’ performance scores.



In this study, students’ demographic characteristics (sex, race/ethnicity, and socioeconomic status) are not significant predictors of student performance. This is reassuring—we should not be able to predict the quality of performance based on a student’s sex, ethnicity, or class. Moreover, as educators we have no control over students’ demographic characteristics. We can, however, change the way in which we teach and assess students.

The results indicate that instruction has a strong effect on student performance on assessment tasks; students who receive a higher level of authentic instruction are more likely to demonstrate a higher level of authentic performance. We should not be surprised: instruction and assessment are, as they say, “two sides of the same coin.” Neither is the small but significant impact of student engagement on student performance unexpected. The more students invest in academic work, the higher their level of authentic student performance.

Our results are similar to those of Newmann and his colleagues. In their study, student characteristics have more impact on authentic performance, but their effects are very small. More importantly, class authentic pedagogy (assessment and instruction) has a strong impact on differences in students’ authentic academic performance.

The study has important implications for the movement toward more authentic, performance-based assessment. Regardless of the quality of the curriculum materials or the assessment tasks, the quality of instruction is critical to students’ performance. Authentic assessments require authentic teaching, and the literature on ways in which instruction may need to change to address authentic assessment is sparse. Studies indicate that teachers using alternative assessments experience a great deal of difficulty making substantive changes to their instructional practices.4 As Eve Baker, an evaluation and assessment specialist, puts it: “By embracing alternative assessments, educators are beginning rather than ending a complex process.”5 The current haphazard, short-term approaches to staff development need to be replaced by substantial, in-depth, and continuing opportunities for teachers to develop authentic learning environments.

It is the integration of authentic assessment, instruction and student performance that makes Newmann’s framework so powerful. Staff development workshops usually focus on a new curriculum, teaching technique, management concern, or on special needs students. Methods textbooks for social studies teacher preparation programs generally include separate chapters on curriculum, teaching techniques and assessment. Although the present study is limited in scope, together with other studies, it suggests that professional development should emphasize explicit connections across assessment, instruction and student performance. Teachers may focus on critical thinking and inquiry in their classrooms, but if they give low-level, basic skills tests, their students do not have the opportunity to demonstrate more authentic work. Conversely, teachers may design excellent authentic assessments, but if their daily instruction focuses on rote memorization and close-ended questions, many of their students are unlikely to produce authentic work. Newmann’s more holistic conception of authentic intellectual achievement provides an excellent starting point for discussion among educators.

The use of authentic assessments is rapidly growing at the classroom and state levels. In Minnesota, the class of 2002 will be required to complete 24 assessment “packages” across the content areas. Each package is a complex series of tasks that require “real world” skills and problem solving. Whether students are required to debate a public issue, analyze primary historical documents, or survey community attitudes, success on the tasks requires a higher level of thinking and involvement than does the typical multiple choice test. As we begin to include more authentic assessments across the curriculum, we need to examine our teaching practices, as well as the effects of authentic assessment and instruction on student achievement.



1. Definitions of “authentic” and “performance-based” assessments vary. Although the terms are sometimes used interchangeably in the literature, performance-based assessments generally refer to tasks that require students to show their understanding through exhibitions, demonstrations, essays, debates, oral presentations, etc. See Robert J. Marzano, Debra Pickering, and Jay McTighe, Assessing Student Outcomes: Performance Assessment Using the Dimensions of Learning Model (Alexandria, VA: Association for Supervision and Curriculum Development, 1994). These tasks may or may not be considered authentic; the term “authentic” is usually reserved for those tasks that require students to use knowledge and skills in the way in which they might be used in the “real world” outside the classroom. See Grant Wiggins, Educative Assessment: Designing Assessments to Inform and Improve Student Performance (San Francisco: Jossey-Bass, 1996). As described in the article, Newmann and his colleagues (1996) define authentic assessment more narrowly. Although they concur that learning is more powerful if tasks are clearly related to real world problems, issues, or concerns, they believe that authentic academic tasks also require students to construct knowledge through disciplined inquiry.

2. Fred M. Newmann, ed., Student Engagement and Achievement in American Secondary Schools (New York: Teachers College Press, 1992); Fred M. Newmann, Walter G. Secada, and Gary G. Wehlage, A Guide to Authentic Instruction and Assessment: Vision, Standards and Scoring (Madison, WI: Wisconsin Center for Education Research, 1995); Fred M. Newmann and associates, Authentic Achievement: Restructuring Schools for Intellectual Quality (San Francisco: Jossey-Bass, 1996).

3. Dana Carmichael-Tanaka, a secondary social studies teacher in the Minneapolis Public Schools and a graduate student in the College of Education and Human Development at the University of Minnesota, wrote the assessment task. Readers interested in obtaining the task may write to her at 1251 Gibbs Avenue, St. Paul, MN 55108.

4. Pamela Aschbacher, “Helping Educators to Develop and Use Alternative Assessments: Barriers and Facilitators,” Educational Policy 8 (1994): 202-223.

5. Eva L. Baker, “Making Performance Assessment Work: The Road Ahead,” Educational Leadership 51 (1994): 58.


Patricia G. Avery is an associate professor of curriculum and instruction in the University of Minnesota’s Department of Curriculum and Instruction. This research was funded by a small grant from the Center for Applied Research in Educational Improvement (CAREI) at the University of Minnesota. Development of the assessment task described in the study was funded by the Fund for the Advancement of Social Studies Education.

©1999 National Council for the Social Studies. All rights reserved.