The History and Future of Grades in Higher Education

Grades have long been part of the American educational experience. From the elementary classroom to graduate and professional schools, grades are used to record students’ progress and, with varying degrees of fidelity, measure their mastery of facts, concepts, and a wide range of academically or professionally valuable skills. If grades were used solely to provide confidential formative feedback – to identify areas of strength and weakness in academic achievement as well as to guide future study – they would not have become such a highly institutionalized and controversial metric of student success. Rather, grades are now the currency with which scarce academic resources are purchased. Competition for admission to selective programs of study, access to merit scholarships, professional or technical certification, and ultimately employment are in part, often in large part, measured by the academic transcript. While the grade point average has become the defining metric of the individual learner, the process of assigning grades is increasingly viewed as an unreliable method of measuring student learning.

As a result of this conceptual dissidence, the modern higher education accountability movement has demanded the application of a more systematic and valid process of assessing student learning. Accrediting bodies, governmental commissions, and state legislatures are increasingly requiring universities to define, measure, and report student learning. What value then do grades have in American higher education? Have they outlived their usefulness? Can grades and the process of grading be reconsidered in ways to make them more relevant to all university stake-holders? A review of the history of the development of grading systems in America will illustrate both why and how the measurement of learning has evolved, shed some light on the future of the assessment of student learning, and perhaps illustrate ways in which grades can regain their utility in the assessment of learning in higher education.

Transition from Oral to Written Examination

Higher education in America was initially available exclusively to the sons of socially prominent and wealthy families. Modeled on their European counterparts, American colleges and universities of the 17th and 18th centuries provided courses in the traditional liberal arts as well as exposure to legal and religious studies. Because of the elite nature and small number of enrolled students, instruction occurred through a combination of didactic lectures and elenctic seminars. Students were advanced to further study following satisfactory performance on oral exams administered by the faculty. In an era when patronage was an essential aspect of the economic health of the institution and when children of privilege filled the class rolls, subjectively evaluated oral exams were conveniently flexible methods of evaluation. Higher education was, at that time, a highly personal experience. There existed no standardized curriculum, no specified learning outcomes, and no direct connection of course content to technical or professional education. Rather, instructors developed lessons based upon their personal academic strengths and interests as well as the antecedent knowledge and capacity to learn of their students. Individualized oral examination was a natural and logical outgrowth of this highly individualized mode of instruction.

Two factors marked the decline in the use of oral and subsequent growth in formally structured written exams. First was a desire on the part of instructors and institutions to evaluate students’ mastery of concepts more consistently. Because of the dialectic nature of the oral examination process, the experience of the instructor interacting with each student and the experiences of the students with the instructor were prone to great variability. There was no way to assure either consistency or validity in the evaluation process, nor could the students’ achievements be measured relative to the progress or achievements of their classmates. The desire on the part of instructors and institutions to establish rankings of students’ achievement required that examination procedures become more structured and standardized.

With the onset of the Industrial Revolution the American higher education system experienced an intense wave of democratic and competitive forces. A defining aspect of that era was a growing incompatibility between the old order dominated by the privileges of class and wealth and an emerging meritocratic order. Not only were oral exams ill-suited to the competitive evaluation of student achievement, they were subject to inadvertent or intentional manipulation in favor of high status students. As such, a transition to written exams provided a mechanism by which instructors and institutions could assure uniformity in the examination process – each student was asked the same questions in the same way with the same period of time allotted for composing answers. Additionally, formally structured written exams provided an objective and publically available record of student achievement. Lastly, written examinations were a more efficient way of measuring student achievement in that an entire class could undergo an examination at the same time while the instructor could mark each exam in much less time than would be required to conduct a series of oral examinations.

Early Subjective Categories

Once the evaluation of student learning moved from a dialogue between the professor and the student to an institutionalized procedure for measuring academic performance, the next step was to establish a system of categorical rankings of achievement. The first institution to document the use of a categorical ranking system – the forerunners of modern letter grades – was Yale University. President Ezra Stiles, in 1785, recorded the grades of students as falling into one of four categories. “Twenty Optimi, sixteen second Optimi, twelve Inferiores (Boni), ten Pejores.” By 1813 Yale had systematized the record keeping process by creating the Book of Averages and requiring “The average result of the examination of every student in each class shall be recorded in this book by the Senior Tutor of the class.” Interestingly, although perhaps not surprisingly, the first use of grading categories at Yale resulted in a pronounced skew towards favorable rankings.

By 1817 the College of William and Mary was using a more fully described categorical system consisting of four ranks “No. 1. The first in their respective classes; No. 2. Orderly, correct, and attentive; No. 3. They have made very little improvement; No. 4. They have learnt little or nothing.” These categories are defined by four different types of criteria: rank one is based on achievement relative to the peer group; rank two is defined primarily by behavioral rather than academic performance; rank three considers growth in knowledge over the period of evaluation; while rank four is based upon an absolute measure – a lack of learning. Clearly, categorical ranking systems represented an initial, and in most cases incomplete, effort to bring organization and structure to the evaluation of student learning through a reporting framework that could be made available to students and their parents as well as the faculty and governing boards of the institutions.

As categorical grading structures came to common use in American colleges and universities, there emerged a concurrent desire on the part of employers, parents, alumni, and governing boards to compare the results of these rudimentary evaluations of student learning among institutions. However, the variety of systems employed, the use of highly qualitative and conflicting assessment criteria, and an ill-defined understanding of what distinguishes outstanding from satisfactory and satisfactory from unsatisfactory work limited the validity of institutional comparisons.

Grade Distributions

From their very first recorded use, grades grouped students into categories of achievement. As described earlier, the Yale students of 1785 populated with decreasing frequency four grade categories from Optimi down to Pejores. Over the last one hundred years the meaning and significance of various distributions of grades has become nearly as contentious a topic in higher education as has been the validity and reliability of individual grades that comprise any particular distribution.

Without detouring too far into a discussion of grading statistics, it is useful to consider the origin and significance of some commonly observed characteristics of grade distributions. When the most frequently occurring graded category (mode) lies at or near one of the ends of the distribution the sample is said to be skewed, as is illustrated by the Yale grades. When the modal category is near the middle (median) of a symmetrical sample the distribution is described as normal, Gaussian, or “bell-shaped”.

In 1906 Northwestern Professor Winfield Hall compiled the grades of 2000 students and found them to be highly skewed towards favorable marks. Hall’s systematic study provided quantitative support for the conclusions reached a few years earlier by Washington and Lee professor LeConte Stevens who found the “tendency to high marking is inherent in human nature. Every professor wishes to be at least as fair, at least as generous, as his conscience may permit; and he is apt to regard his own teaching at least as good as that of his colleagues. Every student wishes credit for the best he has done, and is at least willing to have his short-comings excused. He considers the professor who gives him a high mark to be eminently fair; and the professor who remembers all short-comings is thought to be unsympathetic and inconsiderate.” Beyond the psychological forces that influence the process of evaluating students, one must also consider how grades will be used to either reflect the mastery of academic concepts or to differentiate the levels of learning achieved by different students.

While different instructional and evaluative aspects of a course influence grade distribution, the two general types, skewed and normal, often are associated with two distinct philosophies underlying the evaluation of student learning. When the instructor’s primary concern is to evaluate and document concept mastery, it is common for students to achieve a positively skewed distribution of grades. Conversely, when the instructor establishes a grading system designed to separate the stronger from the weaker students, irrespective of absolute levels of learning achieved, a more symmetrical or “normal” distribution results.

These generalities, of course, call into question the purpose of grading. Is the instructor’s objective to evaluate mastery, differentiate levels of achievement, or to document students’ efforts or commitment to learning? The confusion of purposes illustrated by the William and Mary grade categories from 1817 unfortunately remains a part of the philosophical landscape of higher education nearly two hundred years later.

The Assessment Movement

Despite their widespread use, grades and grading processes are criticized by proponents of the modern accountability movement in higher education. The criticisms are derived from two perceived short-comings. First, grades are at best indirect measures of student learning. A complicating factor is the incorporation of student behaviors – attendance, participation, effort – into grading systems, which causes the resulting final grade to become a measure of motivation rather than of academic achievement. Second, even when behavioral measures are not included, grades are integrated measures of performance over the academic term. Often the final grade is calculated from a set of assignments conducted at various points in time and as such neither adequately measures a student’s absolute knowledge at the end nor academic growth throughout the term. Of course an instructor can devise grading systems that will capture information regarding absolute knowledge (a comprehensive final) or academic growth (pre-test/post-test methodologies), but the flexibility that makes such variation possible underscores the weakness perceived by some – the lack of uniformity in grading process between courses and instructors.

The assessment of student learning differs from the assignment of grades in a very significant way: assessment is a part of an integrated cycle of planning, teaching, assessment, and reflection. Each step in that learning cycle is informed by knowledge gained in the other steps. Conversely, the assignment of grades is structured as an outcome of a linear process of planning followed by teaching. As a summative evaluation of student performance, grades do not possess the depth or richness to fully inform curricular change or enhance student learning.

When implemented as a part of an ongoing cycle of learning, assessment provides formative feedback on student achievement, aligned with specific learning outcomes, which can be of great value both to the student and the instructor. Faculty resistance to assessment typically takes one of three forms: a failure to appreciate the summative/formative difference between grades and assessment; a belief that the role of the faculty is primarily one of differentiating, sorting, or gate keeping rather than one of facilitating, mentoring, and guiding; or the perception that the investments in time and energy necessary to conduct meaningful assessment, reflection, and curricular modification are not justified by a marginal enhancement in student learning. Despite lingering resistance on the part of some faculty, assessment is now an essential part of American higher education.

Do Grades have a Future?

With every passing semester grades are assigned, GPAs are calculated, and students are graduated, retained, placed on probation, or dismissed from universities based upon the grades received. Scholarships are conferred or withdrawn; students are admitted to or dropped from honors programs based upon what are at best incomplete measures of learning. At the same time the accountability movement demands universities become more effective and productive. Student success as measured by retention and graduation – both of which are highly dependent on GPA – is an increasingly critical metric in the funding formulas of public universities.

Can the system of grades be reconsidered and reconstructed in ways that will bring it into coordination rather than in conflict with the desire to measure and report student learning? The complex variety of grading methodologies in use strongly suggests that a whole-scale re-imagination of the meaning and use of grades resulting in consistency and uniformity is unlikely. Alternatively, could the system of grades be abandoned in favor of an integrated process of learning assessment and pass/fail measures of achievement? Despite the historical and cultural entrenchment of grades in American higher education, current trends strongly suggest the future of student evaluation will be one characterized by a fully implemented granular and nuanced system of assessment linked to course and curricular learning outcomes. Students will be measured against a matrix of defined skills, abilities, and achievements, rather than through the use of a GPA calculated to three or four significant figures.

Sapere Aude

2 thoughts on “The History and Future of Grades in Higher Education

  1. I wish we could do away with grades. I wish we did not use grades as ways to assign scholarships. It distorts student behavior. I wish we could be more concerned with student learning. We need a radically different system.

    Suppose I have a student who has passed intro comm, but for whatever reason has significant problems in this area. I would like to assign them to retake the class. But so as to be non punishing, they should not have to pay for the class.

    Just a thought.

  2. I think that grades are not the best way to communicate success in the classroom. They DO, of course, tell one something about how a student did, but they are so vague. Moreover, an “A” in one course often seems like a marker that is supposed to demonstrate something about the given student’s intelligence/diligence across all disciplines. All too often, students describe themselves as “A-students” and demand to know how they can get an “A” in ten courses but then not get an “A” in their eleventh course, even if they apply the same methods previously used.

    For me, the best way to respond to a student is through a written response, which (sadly!) can take up too much time. As we all know, a rubric can make it easier to communicate how a student has done in a class or on a specific assignment/attempt to achieve the given competency. Often we assign grades to rubrics, too, though, like on a 1-5 scale, but I wonder whether the comments on our rubrics could not simply stand alone? That is, from “You didn’t achieve any of the goals set for this task” to “You achieved ALL of the goals,” which might be especially meaningful, if one knows the goals evaluated, too. Would students then try to achieve the goals, rather than the grade? Would they become students of a new type, the “I achieve ALL the goals in the given task type?”

    It is POSSIBLE that assessment of the new GenEd outcomes will lead to just such an act of communication (as opposed to simply an act of grading) that I mean in the previous paragraph, but for the moment they are not really meant to communicate between instructor and student, as far as I can tell. Rather, they are meant to communicate between course and state, a demonstration that the given course in in fact doing what it claims to do. We will see how it develops.

    It seems to me that there are various universities that have abandoned grades. Hampshire in Massachusetts comes to mind immediately. I do not know the details behind Hampshire’s system, but they may be worth examining.

Leave a reply to mark masters Cancel reply