Wednesday, June 10, 2009

What's really the problem with grades?

One of the features of blogging for an audience is that commenters are exceptionally good at identifying the omissions in an argument. My whimsical post below on grade inflation is a perfect example, as commenters quickly mentioned what I also believe to be the most important considerations in grading policy: compression and noise.

First, Samson noted the key difference between monetary inflation and grade inflation: with grade inflation, there is an upper bound. In theory, if average grades become too high, we'll brush up against the maximum grade—either an A or A-plus—and be unable to separate students with genuinely different levels of performance.

While I think there is some substance to this argument (it's infinitely more coherent than grizzled professors' rants against "falling academic standards"), there are a few important wrinkles to consider. First, the existence of an upper bound alone isn't enough to make grade compression a problem. Rather, the difficulty arises because we have both an upper bound and a discrete grading scheme. I'm not saying that we should move to continuous grading, which has its own set of difficulties, but I think that this is an useful point to clarify our intuition. A grading system that allows only A-s, As, and A+s (as inflated as is gets) allows us to communicate exactly the same amount of information as a system that allows only As, Bs, and Cs.

Interestingly enough, Brown uses the latter system, and while I'm convinced that this is subobtimal and leads to genuine difficulties, it hasn't exactly destroyed academics there. And this is equivalent to an extreme case of grade inflation, where all grades except those in the "A" range have been abandoned entirely. In a more realistic scenario, where we retain grade averages similar to those at even the most inflated universities, we can achieve a much better outcome.

Take my own university, where the average grade is a B+. Keeping roughly this average, we can implement the following grade distribution, which I think is perfectly adequate in separating students. It doesn't use anything below a C, and even is only for the extreme low end:

Bottom 2 percent: Cs
2nd-7th percentile: C+s
7th-20th percentile: B-s
20th-40th percentile: Bs
40-65th percentile: B+s
65th-85th percentile: A-s
85th-98th percentile: As
Top 2 percent: A+s

Can students be effectively placed on a distribution much finer than this? I doubt it. And even if you disagree, the inefficiencies introduced by discretization at this level, relative to a hypothetical finer grade distribution, are essentially rounding errors and will be close to random. This suggests that they will aggregate away in overall GPA, and our "compressed" grading scale barely reduces efficiency at all. For more along these lines, check out mathematician Jordan Ellenberg in Slate, who summarizes the point thus:
A grading scale much too coarse to separate students' performances in a single class (for instance, the system with just two grades) can—if it is not too coarse—be perfectly adequate when we have a whole transcript to look at.
This doesn't mean that grade compression is irrelevant. It's good reason to avoid any further inflation, and it becomes a serious issue if we're not allowed to append pluses and minuses to letter grades. But it's not a compelling reason to deflate average grades significantly from their current levels, which is what many anti-grade inflation campaigners desire.

The other major concern in grading policy—one that I think is much more serious—is noise, mentioned by commenter tcspears. If grades are "noisy" signals of student learning and ability, they will be less useful, both as incentives for students to work and as screening devices for employers. It's important, however, to clarify the relationship between grade inflation and noise in grading.

They are connected in a historical sense. In the past, changes in grading standards across the board made divergence between professors more likely, and inflation proceeded at a different pace in different subject areas, leaving the humanities with much higher average grades than the natural sciences. This suggests that we avoid future inflation, insofar as the term refers to a change in grading standards. But it does not imply that we should engage in an across-the-board program of deflation. The entire point, in fact, is that changes in grading norms make it difficult to maintain consistency between professors. In this context, implementing an wholesale shift in the other direction makes little sense as a "solution."

What we really need is standardization. Last year, Duke's student newspaper published a chart (unfortunately missing online) listing the difference between students' major GPA and overall GPA in ten majors: the five with the highest difference and the five with the lowest. The results were astonishing. In the five most "difficult" majors—chemistry, math, physics, biology, and economics—the difference was quite high, at around .8 for chemistry and math. Since major GPA is included in overall GPA, this implies that chemistry and math majors at Duke have major GPAs at least an entire grade point below their nonmajor GPAs. Meanwhile, students in the "easiest" majors averaged major GPAs higher than their overall GPAs. The gulf in grading norms indicated here is almost incomprehensible—it amounts to more than one grade point, which is the difference between a 2.9 (rejected from every medical school in America) to 3.9 (enough to make the cut at Harvard Medical School).

Certainly most employers are aware of the general fact that grades are higher in the humanities, and they take this into account in their hiring decisions. But important institutions like medical and law schools rely heavily on crude GPA cutoffs, and the exact idiosyncrasies in each department's grading are impossible to divine. In addition, the problem isn't just variance by department: it's also variance among classes and professors, which no one from outside the institution can possibly hope to know.

The case for intervention is clear. Universities should take grading noise seriously, and strive to improve consistency both across and within departments.

But now we've come full circle. Given this analysis—that the lack of consistent grading norms makes it difficult to identify real achievement—what is the absolute worst thing a professor can do? Well, he can unilaterally deviate from whatever weak norms do exist, implementing a grade distribution far below that of other classes in the university, all under the bizarre misconception that "academic rigor" means "pretending letter grades are exactly the same as when I was a kid." And that, of course, is exactly what Josh Smith did.

Make no mistake: he's part of the problem.


John said...

Sounds like my Arizona State education is more rigorous than I thought - we have not only A's, B's, and C's, but D's and E's (like an F, but more appropriately positioned after D) as well. And upper division physics professors pride themselves on grading on traditional curves centered about the C that Duke apparently calls the minimum. So we have to explain not only a B in quantum mechanics, but also the fact that we earned it at Arizona State...

Crossed said...

@John: Right. Differing standards is a far bigger problem than mere grade inflation. Imagine you're in charge of hiring; you now have to re-curve all the GPAs based on what school the candidate is from--or even what department they're in. *If* you can even get the right amount of information.

Anonymous said...

At the course level, grading is fraught with validity and reliability problems. Except in the most mundane repetitive regurgitation (like math problems), these weaknesses dominate the standardization problem.

Grades signal how well one adapted to the environment. If smart people adapt quicker they have a lower cost to signal. Obviously bad students are also easy to spot.

As someone who hired in a top tier tech firm across two decades and now works in academia, I know we calibrated our expectations at each university based on our prior distribution. We knew the type of person that would excel in different courses and could match relative capabilities to specific professor's grades. We backed that up with in-person interviews. This strategy is just more effective than grade analysis and always will be.

Anonymous said...

Well-argued post! Enjoyed it quite a bit! I shouldn't be so curious but having majored in economics and math at college, I feel entitled to ask: what are the easy majors at Duke? :)

quba said...
This comment has been removed by a blog administrator.
Frank said...

While I like the post and much of the comments, a simple solution of sorts would be to put the average grade of the students in any one class on every student's transcript. Unfortunately, the solution would likely have to be imposed on every institution, for no single one has an incentive to adopt it, though I'm still mulling this over. [Something like 90% of Harvard Undergrads are on the Dean's List--must be a good place, right?]

Matt Rognlie said...

Anonymous at 3:23...

I think I remember the five "easiest" majors according to this metric being drama, classical studies, religion, literature, and music, but don't quote me on it. :)