Friday, July 07, 2006

My ramble about standardized testing

I recently came across a commentary on the SAT written several years ago by a Rutgers English professor. His thesis is plain: the SAT, or at least the SAT verbal section, is not the meaningless "test on test-taking" that its opponents caricature. Instead, Professor Dowling claims, the exam effectively identifies the verbal competency that is so important for successful college work.

First, I should note that I'm not an entirely unbiased observer here. I placed the stamp on my last college application just over six months ago, and I took the SAT six months before that. It's fresh in my memory. My egoistic subconscious will want to claim, since I scored a 2400, that the SAT is indeed a glorious, inerrant measure of intellectual promise. It isn't.

As I mentioned, Dowling focused on the old verbal section of the SAT. This was fortunate. The verbal section, and the "critical reading" that replaced it, is probably the most effective part of the SAT. It resists attempts at simplification. While on the math or writing sections, a prospective examinee can theoretically prepare for almost any problem "type," the variety of passages and vocabulary on the verbal section makes such fake mastery close to impossible. I'll admit that I don't have any hard data to support this, but my experience with helping others plan for the test leaves me with the impression.

Yet as I just indicated, the math and writing sections are quite different from the verbal. For instance, the SAT is fine for students without strong interest (or talent) in mathematics. It takes a reasonably accurate snapshot of their ability, which admission officers may find useful. But these students, in general, are not those who are applying to the most selective universities, for whom the SAT is most important. Indeed, at the high end of the math SAT scale, predictive power is hazy at best.

For high-scoring students, the questions become so easy that the main challenge isn't solving them. It's avoiding trivial arithmetic errors. Since so many students find the math questions so easy, the "scaling" process has been made brutal and turns these trivial arithmetic errors into apparently meaningful differences in score. Make 3 little miscalculations on an endless exam? You would have scored 800; now you're lucky to have a 730.

The writing section posesses its own set of flaws. Just as many academics proclaim, the SAT essay is a farce that reinforces formulaic writing with claims to a "standardized" scoring process. I suspect that the five-paragraph essay has destroyed more writing talent than any other scholastic artifice—as you can probably tell, I still haven't recovered (a double tragedy since I never had much talent). In tenth grade, I tried to write a essay for history class in a different style. It sought to imitate the clever organization I saw in real essays—bits from the Atlantic and Harper's and the New Republic. What happened?

Smack! B-minus. Back to the five-paragraph shell.

I should mention that the essay is only one-third of the writing score. The rest consists of multiple-choice questions. This seems paradoxical, reflecting some absurd attempt to impose the SAT philosophy: "How will we measure writing? Well, isn't it obvious? Bubble sheets!"

But I suspect that it's actually more effective than the essay. My gripe with the multiple choice section actually stems from a particular variety of question: "identifying the sentence error." Each problem is a sentence: four parts of the sentence are underlined, labeled "A," "B," "C" and "D". A "no error" option follows, corresponding to "E." If an underlined portion contains a grammatical error, students must bubble the corresponding letter. If none do, they bubble "E."

Maybe it's a bit pedantic, but it sounds innocuous, right? Not really. The other parts of the writing section call upon students to improve sentences or paragraphs. They must consider both style and grammar to find the best option, and usually it's not very difficult. But the work of "identifying sentence errors" requires that students set aside any sense of fluid writing and focus simply on identifying technical errors. Now for the obvious question: is there any circumstance, outside of testing, where this is actually useful? Where the fantastically bad phrasing in a sentence doesn't matter but the exact use of pronouns does?

The SAT, needless to say, is not a perfect test. It's probably a decent one—I'll leave that to argument. But as the established test, it plays a vital role in contemporary education. I want to make several additional points.

First, there is precious little evidence that the SAT makes admission to the most elite schools (Ivy Leagues, MIT, Stanford, etc.) more difficult for disadvantaged students. It might even help them. Consider: according to a report by the Century Foundation, 15.3% of Caltech's students are Pell Grant recipients, versus 6.8% for Harvard, 7.4% for Princeton, and 11.7% for Stanford. Caltech also happens to have the most test-focused admissions policy in the country, which results in an average SAT that is usually around 1510. That's higher than Harvard—in fact, higher than any school. How, then, does Caltech admit so many more disadvantaged students?

There are several possible explanations, but my best guess is that the "other factors" considered by elite schools are often quite discriminatory. Harvard, for instance, seeks out high-level involvement in extracurricular activities: the state champion violinists, the debaters who win regional contests, etc. But just who are these students? How does a violin virtuoso start? Private lessons—and does anything scream "affluent middle class" more than expensive lessons at a very early age? How do debaters become so successful? Well, to start, they usually have strong programs at their high schools. These aren't found at your inner-city schools, but they're in strong supply at private academies.

Let's be honest: SAT tutoring isn't the only advantage possessed by rich students. And as someone who never received any help for the test, I'm frankly annoyed at how it's fingered as a mere pawn of wealthy elites. Yes, the SAT can be "gamed" to some extent, but I suspect the system would be far more vulnerable to free-spending parents without it. I want better testing, not conspiratorial neo-Marxist rhetoric about testing in general.

My second point is directed toward the ludicrous use of "score ranges," which plague standardized testing in general. The initial idea isn't so bad: a test will inevitably be imprecise in its measurements, and a "score range" provided by the testing agency can illustrate the degree of this variance.

But many universities then decide to incorporate "score ranges" into their admissions procedures. I've heard about this with MIT: they have a "750-800" range, a "700-740" range, and so on. Supposedly, this recognizes that minor differences in score don't mean much. But in effect it arbitrarily ignores all incremental differences in score except one (740 to 750), which is blown beyond reasonable proportion. This doesn't actually solve anything; it just sets inefficient cutoff points that distort results and preserve the very weakness they seek to eliminate.

Third, this discussion shouldn't just be about the SAT. We already have tests that are arguably more effective, and we should work to create tests that are even better., the leading opponent of standardized tests, has a fact sheet comparing the SAT I, SAT II, and ACT. You would expect the sheet, entitled "Different Tests, Same Flaws," to provide evidence selected for its damning rejection of all three tests. Yet a validity study cited in the SAT II section, from the University of California system—an ideal environment thanks to its relatively wide range of students—actually shows that SAT II results had more predictive power than high school GPA. That's pretty remarkable, given Fairtest's blithe assertions that all standardized tests are worthless, soul-destroying shells.

Soul-destroying? Somewhat. Worthless? No. Further evidence of an elite conspiracy to keep poor students out of top colleges? I don't think so.

The main problem with SAT IIs (and AP exams) is that their subject-oriented character leads to potential unfairness. Students at stronger schools, the argument asserts, are far better prepared to handle the tests. This is probably true, but we can work to alleviate this with some imaginative thinking: create a system that automatically informs colleges of the average scores on each test at a particular high school. Why not?

Finally, I want to emphasize the inability of these tests to deal with truly extraordinary talent in a particular area. Even if the SAT perfectly measured ability in each section, we would still have 2200s with far more promise than 2400s. Why? The 2200 might be an extraordinary writer with middling math skills; 800s on the critical reading and writing sections wouldn't begin to describe her brilliance. And the SAT certainly doesn't perfectly measure ability—there are enormous inherent limitations.

I realize that all this writing may sound incoherent: one minute I'm defending standardized tests and the next I'm ridiculing them. But I do have a consistent philosophy, which I'd describe as "realist". Yes, the SAT has flaws; yes, it can and should be better. Yes, there are broad problems with testing in general. My point is that we can't view these issues in isolation. It's not enough to criticize the system—you have to look at the alternatives. The SAT came about as a way to loosen ignorant WASPs' stranglehold on the Ivy League. It hasn't succeeded fully, but we can't just dismiss it.


Zeke Rutherford, Mountaineer of the West said...

As usual, Mr. Rognlie, you make some cogent arguments, but I think you missed an important point with regard to SAT II and AP subject tests. The problem with subject-oriented tests isn't that they give an unfair advantage to students at stronger schools (even though they do). The problem is that they only measure a student's pre-acquired knowledge in one particular academic field. This may be useful in allowing a college admissions office to find out what an applicant already knows, but it is hardly a dependable means of surmising what kind of student he will make for the next four years.

Which brings up the question of that University of California study you mentioned in your post. Having read only the FairTest summation of the results, and not the study itself, I am left wondering about some important details of the collected data. For example: did the SAT II tests account for 15.3% of the variation in every subject field, or only the field specific to that particular test? If all subject fields were taken into account, then I am inclined to regard the correlation as merely a statistical fluke: common sense prevents me from believing that a student's low score on the SAT II biology test has anything to do with his poor performance in an American history seminar. On the other hand, if only one specific field was accounted for, there is an equally plausible explanation: only the most motivated and talented students in the field bother to take the SAT II test, so their high scores are quite likely to correlate with a better-than-expected performance in a college course.

In any event, subject-specific tests simply present far too narrow a picture to be of much help to an admissions officer. Since success in college requires aptitude in more than a few subjects, the most useful standardized test would present the broadest possible portrait of a student's capacity to handle knowledge. What would that test look like? I have no idea. But the good old 1600-point SAT, with its focus on general logical aptitude as well as specific skills in major topics like math and English, was probably as good a bet as any.

Matt Rognlie said...

Indeed, the SAT II tests did account for 15.3% of the variation in every subject field. Given the subject-specific nature of the tests, this appears rather remarkable. But it is entirely possible that the proficiency students display in acquiring knowledge is a powerful indicator of future performance in all areas. In fact, that seems to be what the data is telling us, and the samples are strong enough that it probably isn't a fluke.

I think that the traditional SAT's biggest failing may be its inability to measure "conceptual intelligence": the ability to assimilate new and unfamiliar ideas and conceptual structures. Aptitude-oriented verbal and quantitative tests may tell us very little about a student's ability to grasp, say, the valence bonding system in chemistry (which really has nothing to do with either reading or math skills).

In an ideal world, we could have a non-subject-specific test that would measure conceptual "aptitude." Unfortunately, I can't imagine how such an exam could conceivably be structured. Part of the problem is that real learning takes much longer than the three or four hour time-frame of a test. I doubt that a test could successfully introduce imaginary "concepts" and then test them to measure students' conceptual intelligence.

So we are left with an imperfect way to measure this ability: indirectly, through the relative success or failure of students' actual learning.

Of course, successful learning doesn't solely reflect the inherent ability to learn. It also depends on many other contributing factors. How much material did I miss in high school because I was unsuccessfully flirting with the girl next to me and not paying attention to the class? It's incalculable.

But as the above example illustrates, some extraneous factors in the mix are actually good for predictive power. The tendency to concentrate more on social advancement than learning does not bode well for collegiate success, and SAT IIs are more likely to reflect it.

Achievement tests represent the middle ground between grades and purely aptitude-based exams. They are not arbitrary and irregular like grades, but they manage to evaluate skills and attributes that the traditional SAT hardly touches.

And in the end, studies like the University of California one I cited back this up. Or at least they seem to - I should mention that at the time, SAT IIs included the writing test, which (against all odds) was known to have a remarkable correlation with college success. It's possible that now, with the absorption of the writing test, the standard SAT indeed does have stronger predictive power. Whatever the case, I suspect that SAT IIs still improve the ability to make admission decisions, although they may not be the "best" in isolation.

I'd be open to other ideas too. A few days ago I took a sample LSAT online out of curiosity, and it seemed to incorporate logical aptitude in a way that most standardized tests don't. Maybe some similar questions would be good for undergraduate admissions?

And finally, although it's a moot point, self-selection among SAT II examinees wouldn't itself increase the correlation with field-specific success in the manner you describe. I suppose that the problem here is the difference between the conventional and mathematical definitions of correlation: in conversational terms, "correlation" is simply association, while mathematically it has a very technical definition. The "percent correlation," usually labeled r^2, indicates the relationship within two sets of data - so if I say that high SAT II scores are 40% correlated with higher grades in a field, the students with high SAT II scores must have measurably outperformed lower scorers. A general shift in the sample, like that from self-selection, won't affect the correlation: only the differences within the samples count.