The Issue of Subjectivity in Authentic Social Studies Assessment


Pat Nickell

The decade of the nineties will undoubtedly go down in the annals of education history as one of systemic reform. Critical among the many types of reform undertaken has been the challenge to change the way we measure student learning. The late 1980s gave us a number of highly vocal advocates for more “authentic” assessment, or for tasks that are a better measure of what students can do with what they learn, rather than simply what they can commit to memory for the short term. Much was written about the value of revamping measures of student learning to solve problems that have plagued “traditiona#148; standardized tests for nearly three-quarters of a century.1

As states began developing standards for learning, the stage was set for new assessment systems as well. It seemed only reasonable that if states were going to develop specific standards for school districts to use in ratcheting up curriculum, then student achievement should be measured against these same standards.

However, whether new state assessments have been developed or not, teachers are seeking information and guidelines that, we must assume, are affecting their practice. We can look at the number of teachers attending voluntary workshops about the design and scoring of authentic assessment for classroom practice. We can look at the number attending related sessions at professional association meetings, and the number of training programs and teacher handbooks being produced. While multiple-choice and other fixed-response formats for testing will not likely go away, other forms of student-generated demonstrations of learning—such as portfolios, open-ended questions, and individual and group performance tasks—are increasingly familiar to today’s practitioners.

Still at issue is scoring. Most of us who have spent significant time in classrooms have had the experience of dealing with a student and/or parent who disagreed with our evaluation of the student’s work. We may even have changed a grade or two under pressure that our original grade had been “subjective.” As teachers increase the use of new forms of assessment involving open-ended questions and student-generated performances, charges of subjectivity in a climate of teacher mistrust are inevitable. In various training programs about assessment design for the classroom, teachers have voiced this concern to me repeatedly. It is the purpose of this article to suggest a way to reduce subjectivity in scoring classroom assessments.


Scoring Criteria

In assessing authentic work in social studies, there should be an internal element of the assessment prompt, whatever form it takes, that will make scoring simpler and significantly reduce the risk of subjectivity. That element is what this author terms “scoring criteria.” The term has been used in other literature to refer to both components specific to a task2 and to broad fixed categories that describe levels of student proficiency.3 However, for our purposes here, the term will refer to specific expectations made clear in task instructions that are used to evaluate student work.

As an assessment item or task is developed, every expectation placed upon the student (with the exception of the additional expectations described below) should be apparent in the instructions. This in itself can lead to fairness in assessment surpassing “objective tests” that involve fixed response items. First, good tasks directly relate to objectives of instruction, which may not be the case when textbook or standardized tests are used. Second, students know exactly what is expected and are less likely to misunderstand the task, misinterpret the task, or simply go in a different direction from what the teacher had in mind.

There are those who will argue that to spell out exactly what is expected makes the test too obvious or easy, but that is a weak claim. We know when we begin driving a car what is expected of us on the driver’s test and what will cause us to fail. Adults expect an employer to lay out his or her expectations very clearly, and failure to provide this information has been the subject of many a lawsuit. We expect any activity in which our success or failure is to be measured to be accompanied by clear measures of success and failure; when it is not, we become uncomfortable, defensive, or frustrated.

An Assessment Task for High School Geography

Here is an example that might be used in a high school social studies class in geography or even economics. In this example, note that quite a bit of knowledge about the student’s home state is required, as well as decision-making, letter-writing, and persuasion. It is assumed that these were objectives of instruction.



Imagine that, growing up, you spent most of your summers with your uncle who had a cattle farm in {a state other than your own}. You loved working with him on the farm and you decided long ago that you wanted to become a cattle farmer. Since you graduated from college, you have worked for several years with a farm equipment company, trying to save enough money to buy a small herd and begin farming. Two weeks ago, you received a letter stating that your uncle, who recently died, left you 50 head of cattle to help you fulfill your dream. Because you love your own state and want to stay relatively close to home, you want to bring the cattle here, but you need to decide upon the best region in the state for cattle farming so that you will know where to go looking for land. You also want to find an area that offers places not TOO far away for you to enjoy your favorite leisure activities.

Using what you have learned about the various regions of your state, make a determination as to the areas you think are well suited for cattle farming. You should not only consider where cattle farming can be successful and why, but also your personal needs and preferences regarding proximity to towns or cities, recreation facilities, bodies of water, etc. Decide on the best area for YOU. Write a letter to your best friend from college, who is now living in Thailand, telling him/her about your good fortune, where you plan to locate, and what factors you used to make your decision. Be sure to include as much information as possible since all the letters you’ve received from your friend are full of detail. You enjoy those letters and your friend deserves the same. Don’t forget to use personal letter form.


In this example, there are at least four scoring criteria:

1. The student identified a region that is suitable for cattle farming

2. The student described several factors that she/he used to make a decision

3. The student used personal letter form

4. The student was thorough, providing sufficient information to justify his/her decision


Additional Expectations

Before discussing the use of scoring criteria to develop a rubric, it should be mentioned that additional scoring criteria may be assumed if appropriate steps have been taken. Namely, if the teacher has made very clear that all written work is to incorporate correct grammar, complete sentences, and correct spelling, then these may be added scoring criteria. To be most fair, such criteria should be prominently and permanently displayed on an “Always…” chart in the classroom. Then, when a student or parent wishes to take issue with your reducing the score for a spelling error, you have the backing of your chart and your verbal reminders to students.

An “Always…” chart might also require students to:

1. Include all and only pertinent information

2. Calculate correctly

3. Get to the point, stick to the point, and only make a point once

4. Cite any sources they use (using agreed upon format)

5. Read instructions thoroughly before beginning work and again afterwards as a check

6. Edit their work


Developing the Scoring Rubric

Once the scoring criteria appear in the task prompt and are inferred by the “Always…” chart, the top level of the rubric is finished. I do not add other criteria, such as “student added personal elements to letter, such as inquiring about Thailand,” because I am looking for the student’s ability to meet my objectives accurately and I don’t want to be distracted from that. Besides, to do so opens the task once again to charges of subjectivity.

1. The student identified a region that is suitable for cattle farming

2. The student described several factors that she/he used to make a decision

3. The student used correct “persona#148; letter form

4. The student was persuasive, providing sufficient information to justify his/her decision

5. The student used complete sentences, correct grammar, and correct spelling

6. The student did not repeat points unnecessarily

7. The student’s work does not contain any errors that would have been corrected if carefully edited

The only major difficulty left is deciding how many scoring levels to adopt. Three or four levels are fairly easy to describe and distinguish between, but five have the advantage of adhering to “ABCDF” grading. Several attempts to develop and use scoring rubrics will serve as the best guide for the number and nature of levels.


Final Cautions

A common mistake, and an easy one to make, occurs when the teacher comes upon a paper that is extraordinary in some way. Perhaps it incorporates some brilliant insights, is extremely creative, or shows evidence of research beyond what was required. The teacher, in order to honor this performance, gives this paper an “A” and shifts less impressive ones downward. Once scoring criteria are established in the prompt, students must be able to trust that to fulfill them is to score at the highest level. An exceptional performance receives praise and is perhaps put on display for others to learn from, but it does not change the rubric.

Another mistake is to assume that students will score on some sort of curve. Thus, the scorer searches desperately for justification to place 5% of papers in the highest scoring level, 15% in the second, 60% in the third, 15% in the fourth, and 5% in the fifth. In a perfect world, if students know what is expected of them and have been doing their work, all students should score at the highest level. Sadly, that will very rarely happen. Even if the teacher verbally reviews the prompt, has students identify the scoring criteria, and writes the criteria on the board, there will likely be students who will not achieve the highest level. But under no circumstances should a bell curve be a value in this kind of scoring.

A third error is to write a prompt with too few scoring criteria or worse, one that has a right or wrong answer and nothing more. When a teacher wants to know if a student knows a particular piece or body of information, fixed-response questions are in order. An elaborate open-ended question or performance task to find out if the student knows that the Bill of Rights is found in the Constitution is inappropriate. Unless the student is being asked to fulfill four or more requirements, there is not enough in the prompt to develop the scoring criteria. Further, the scorer must be able to describe each of these components, or criteria, at four or five levels of success; otherwise the task is going to be very difficult to score. Either additional components need to be added to the prompt or the item should be rewritten as one or more fixed response or short answer items. Consider the following prompt in which the requirements, or scoring criteria, are underlined:


Assessing Understanding
of Perspective

Go online or to the public library and examine several newspapers from different parts of the country (they are bookmarked for you on the class computers). Find the political cartoons in each. Now find two or more that are about the same topic. Print them (“current page only”) and attach them to a paper on which you describe each cartoon. In your descriptions, explain what each object or symbol in the cartoon represents, and what the overall message of the cartoon is, and describe the perspective of the cartoonist as communicated by the cartoon.



Increasingly, teachers are trying out new types of assessment in classroom evaluations of student progress. Open-ended questions, performance events, portfolios, multimedia demonstrations, and the like are increasingly familiar as ways to measure learning. Teachers should not be fearful of charges of subjectivity in scoring such student work. Rather, given the use of careful prompt development that incorporates scoring criteria, teachers should feel confident in their assessments. Incorporating scoring criteria in such a way means that there should actually be LESS subjectivity than when using mass-produced tests that decontextualize, randomize, and trivialize bits of information that may or may not have been emphasized in instruction.



1. See, for example, D. A. Archbald and F. M. Newmann, Beyond Standardized Testing (Reston, VA: National Association of Secondary School Principals, 1988); R. Brown, “Testing and Thoughtfulness,” Educational Leadership 46, No. 7 (1989): 31-33; L. Darling-Hammond, L. “The Implications of Testing Policy for Educational Quality and Equality,” paper prepared for the American Educational Research Association Forum, Washington, D.C. (June 1991); R. A. Denoyer and M. White, “Tests – Fallible Indicators of Educational Quality,” NASSP Bulletin 74 (1990): 49-52: 49-52; W. Haney and G. Madaus, “Searching for Alternatives to Standardized Tests: Whys, Whats and Whithers,” Phi Delta Kappan 70, No. 9 (1989): 683-687; D. Harrington-Leuker, “Beyond Multiple Choice: the Push to Assess Performance,” The Executive Educator 13, No. 4 (1991): 20-22; K. H. Harris and W. S. Longstreet, “Alternative Testing and the National Agenda for Control,” The Clearing House 64 (1990): 90-93; A. T. Lockwood, “Authentic Assessment,” Focus in Change (Newsletter of the National Center for Effective Schools) 3, No. 1 (1991): 1-13; R. J. and A. L. Costa, “Question: Do Standardized Tests Measure General Cognitive Skills? Answer: No,” Educational Leadership 45, No. 8 (1988), 66-71; F. M. Newmann, W. G. Secada, and G. G. Wehlage, A Guide to Authentic Instruction and Assessment: Vision, Standards, and Scoring (Madison, WI: Wisconsin Center for Education Research, 1995); K. D. Peterson, “Effective Schools and Authentic Assessment,” The Newsletter of the National Center for Effective Schools 3, No. 1 (1991): 14; L. A. Shepard, “Why We Need Better Assessments,” Educational Leadership 46, No. 7 (1989): 4-9; G. Wiggins, “Teaching to the (Authentic) Test,” Educational Leadership 46, No. 7 (1989a): 41-47; G. Wiggins, “A True Test: Toward More Authentic and Equitable Assessment,” Phi Delta Kappan 70, No. 9 (1989b): 703-713.

2. Newmann et al.

3. Lockwood.


Pat Nickell is an Assistant Professor in the College of Education at the University of Georgia, which she joined after a twenty-seven year career in public schools. She served on the NCSS task forces that developed the national social studies curriculum standards and standards for teaching and learning, and has written and trained in the areas of character education, curriculum development, instruction, and assessment in social studies. In 1996-97 she was President of National Council for the Social Studies.

©1999 National Council for the Social Studies. All rights reserved.