Friday, February 27, 2009

Tradeoffs and the F-22

Needless to say, I don't have any personal expertise on military procurement to add to the recent discussion of the F-22. I do, however, spend a lot of time considering flaws and biases in our thought processes, and I think the debate over fighter jets provides a great example.

The Air Force's thought process here is pretty clear: we need to give our boys the best damn fighter jet we can buy, costs be damned. This is a common tendency -- elevating some objective, like the superiority of our technology or the safety of our pilots, above all else, and declaring it essentially priceless. Such argumentation certainly has its appeal. Risking lives to save money always seems crass, as evinced by some readers' indignant reaction to my previous posts on the topic.

Yet at some level, we will always be forced to make tradeoffs, including tradeoffs that limit safety and human life. We can always find a way to make our pilots and soldiers safer by throwing billions more at them, but unfortunately we do not have infinite resources at our disposal. At the end of the day, the amount of money we're willing to spend on expensive fighters is limited, and rational military policymakers should consider the internal tradeoffs at play: given that we have $X to allocate in aircraft procurement, how can we achieve the greatest effect?

Again, I'm not an expert on military matters or fighter aircraft. My impression, however, is that in an environment where our force structure is likely to be far smaller than in the past, perhaps too small for comfort, two F-35s are better than one F-22. I'm sure some Air Force careerists would indignantly retort that we really need two F-22s, but the rest of us live in a world where tradeoffs actually exist. As long as an F-35 costs less than half as much as an F-22, and is likely to give well above half the benefit, any government with a budget constraint should focus on the cheaper option.

These points probably sound bland and obvious. They should be. With a wide array of spending options at our disposal, painful choices must be made. It's just all too easy to forget the obvious when you're emotionally invested in a single outcome.

Defense really is half the game

Every so often, some ornery old basketball hand will remind us all that "defense is half the game," and that when we concentrate mainly on the offensive end of the game, we're ignoring a critical component of team success.

While it sounds reasonable, it isn't necessarily true. Yes, defense is "half the game" in the crudest sense. Inevitably, the average team will spend half its time on defense. But this doesn't mean that defense plays the same role as offense in determining team victories: maybe most teams are close to the average on defense, and offense is where the variance that produces "good" and "bad" teams actually occurs.

In fact, when you think about it, this is a testable hypothesis. Why not just measure the standard deviations of offensive and defensive efficiency in the NBA, and see which turns out to be more important? John Hollinger conveniently provides us with these numbers on his team statistics page

What's the answer? It turns out that defense really is half the game. In fact, at least in this year's sample, it's a little more than half the game, with a standard deviation of 3.39 compared to 3.18 on the offensive side. 

Maybe the New York Times was on to something...

Remedial Statistics for Journalists: An Irregular, Ongoing Series

Matt Zeitlin is worried about the future of journalism:
It’s widely known that lots of journalists are innumerate. And while a lot of people are innumerate, and journalists are just that, people, it becomes a problem when it comes to reporting economic news and data. A popular mistake is reporting nominal dollar figures instead of inflation adjusted figures when comparing wages or sales in two given years. Or you get the horrible mangling of simple statistical comments like significance or margin of error. The point is, journalists need to know math! Or at least some very basic statistics and economics.

One problem, I feel, is one of cultures. Most journalists studied humanities in college, and best I can tell, people who study the humanities are largely the same people who say “Thank God I don’t have to take any math classes again.” And while you don’t actually need to know any college math to be a good journalist, you really do need to know some statistics, or more generally, quantitative methods.

At Medill, Northwestern’s Journalism school, they require students to take some sort of Math for journalists class. And, if a friend of mine’s facebook status — since when does journalism involve math? seriously? — is any indication, there is still a lot more work to do.
My feelings exactly. Most of the time, I feel compelled to restrain myself in commenting on the matter, since I risk sounding like the obnoxious math geek who tells English students they're just not smart enough to take real classes. But now that Matt Zeitlin (a bona fide philosophy major) is talking about it, I might as well enter the fray.

He hits the nail on the head by pointing out that college math isn't necessary to perform competently as a journalist. Most high school math isn't important, for that matter. You don't need to remember the quadratic formula, know how to divide complex numbers, or understand whatever the hell the law of cosines is. If you started disliking math because all the symbols and equations scared you, it doesn't matter: as long as you can grasp basic quantitative concepts, you can be a perfectly capable journalist.

What do you need to learn? Getting confused between real and nominal statistics, as Matt mentions, is indefensible. You'd think that this would be the first piece of training given to anyone reporting economic news: whatever you do, don't say that wages have increased because inflation has increased their nominal value! The difference between a mean and median isn't too difficult either. A lot of smart young bloggers that aren't exactly math geeks have already figured this out -- why can't writers at leading national newspapers?

In the spirit of progress, I therefore present you with the first installment of Remedial Statistics for Journalists. It's hardly complete, and might be better titled A Random Collection of Matt's Pet Peeves, but it's good to let out.

Polling.

If a poll's margin of error is 5% and Obama polls 52%, this result is not "statistically indistinguishable from a tie." You'd think that this would be evident using a little basic logic: if 52% is statistically "indistinguishable" from 53%, which is in turn the same as 54%, and so on, transitivity implies that 52% is indistinguishable from 100%, which is clearly false. Alas, journalists' desire to display what they perceive to be statistical sophistication gets ahead of them here, as a snappy-sounding declaration that two numbers are "statistically indistinguishable" impresses the editor enough to get by. Yes, 52% is very close to 50%, and there's a substantial chance that the real figure is below 50% -- but even with this very narrow margin, it's more likely that the actual figure is above 50%, meaning that the poll result is hardly "equivalent" to a tie.

Indeed, the "margin of error" is nothing special at all. It's based on the arbitrary choice of a 95% confidence interval, which in turn only deals with easy-to-quantify sampling error. That's the error you get when you select 1000 voters, in a perfectly random way, from the population; even if your method of selection is perfect, there's inevitably some error resulting from the fact that you haven't sampled the entire population. As it happens, this isn't the most serious error you get in polling, because it can be aggregated away: the average of 4 polls, each with a 5% margin of error, no longer has a 5% margin of error. (Journalists tend to be obtuse here as well, and often talk about how averages of many polls are within the "margin of error," even when the margin of sampling error is much smaller in the aggregated sample.)

The real danger in polling is that the method for selecting the sample is flawed, a danger that lies outside the "margin of error" concept altogether. Maybe you're oversampling elderly white women, or disaffected former construction workers. Indeed, pollsters find this to be such a problem that they implement ad-hoc demographic "weighting" procedures to correct for it. But even if you've pinpointed the percentages of every racial, gender, and age category in the voting pool -- no mean feat, because these percentages fluctuate from year to year -- you're left with the possibility that you'll systematically oversample voters of a particular political persuasion, even if you've compensated using all the obvious demographic markers. In this sense, the margin of sampling error (the one you see on press releases) is actually a lower bound on the true margin of error, and depending on the pollster can be a very serious underestimate indeed.

Hopefully that's been a nice lesson -- unfortunately, statistics is about a lot more than just polling. Many times, you'll end up reporting how "researchers have found" that hugging seals causes chronic obesity, or some other amusing nugget from the world of science that makes good copy. How do these mysterious "researchers" come to such conclusions? Well, one way is...

Randomized Experiments

When they work, randomized, controlled experiments are a great way to identify differences and causal effects. Want to know whether a drug actually helps patients with hypertension? Simple: give some subjects the actual drug, some subjects the placebo, and see what happens. After I confidently declared that there was no real difference between Busch and Bud Light, a few of my friends and I tried this last week, and our blind, randomized evaluation found that unless you're much better at tasting cheap beer than we are, the extra $5 for a 24-pack is completely useless. (Statistics in action!)

Alas, randomized experiments aren't the answer to everything. Their results might be internally consistent, but often they only tell you that a smattering of coerced psychology undergrads pressed a red button after being told to think about trucks. You need to know whether the participants are a representative cross-section of the population under study, whether the sample size is large enough to justify bold claims about its implications, and whether the experimental design introduces any biases into the research.

Sometimes experiments are great (medicine). Sometimes they're shaky (psychology). Sometimes they barely even apply (economics). Understand the difference.

Regression

This is researchers' main tool to draw statistical conclusions when they can't run a randomized experiment. The idea is to measure how a "dependent variable" is influenced by a number of "independent variables." The simplest and most commonly used assumption is that the effects are linear: that, say, an increase of 1 in X is associated with an increase of 2 in Y. A standard technique, called "ordinary least squares," allows you to place a number on each effect. Never mind how it works mathematically -- it's not too complicated, but software packages do all the work anyway, and any formulas are likely to distract you.

The key is to understand the limitations of regression analysis. First, it doesn't magically overcome the fact that correlation is not causation. If you regress income on education, you don't know whether education is causing people to earn more (probably), better financial circumstances allow people to educate themselves longer (probably, albeit to a much lesser degree), or whether they're both tied up in a swirl of complicated demographic and social factors.

Any variable that's relevant but not included in a regression equation can distort the results. Does increased time spent playing video games actually result in lower grades, or is heavy gaming just an indicator of low academic motivation (something that's fuzzy and difficult to measure in a study)? You can try to add more variables to the model, "controlling" for important factors, but omitted variables will inevitably still present a problem, and sometimes the overzealous addition of controls can actually make the problem worse.

Tons of researchers run regressions without understanding what they really mean, or how problems like omitted variable bias or endogeneity (a fancy way of saying that causation runs in multiple directions) can distort the results. Ask yourself: does this study's approach really prove that X causes Y? If the researchers only claim that two variables "correspond" or "are associated," don't oversell their findings, and if they claim causation, apply a little critical thinking to see how credible that claim is. Sometimes these problems are unavoidable, and we have to rely on whatever flawed research we have, but there's a lot of nonsense spread by journalists' overeager pens.

Thursday, February 12, 2009

Positively stimulating

Matthew Yglesias is right. Spending can be perfectly good "stimulus," even if it doesn't produce anything else of value:
In other words, when the primary point of spending money on something is to get the thing you need to worry a lot about efficiency. You don’t want “wasteful” procurement wherein you overpay for stuff, or spending on stuff that doesn’t work. For the purposes of fiscal stimulus, however, while it’s better to spend the money in an efficient way on useful items, it’s not essential to do so. Which doesn’t mean we should totally throw caution to the wind and pay people to dig holes. But it does mean that it makes perfect sense to relax our criteria for what counts as useful and what counts as efficient. The efficacy of stimulus as stimulus just has to do with how quickly the funds cycle into private hands and then out into the wider economy and has relatively little to do with “efficiency” in an ordinary sense.
Unfortunately, this doesn't contradict Kevin Murphy's argument. Murphy would agree, I think, that the efficiency of stimulus "as stimulus" isn't determined by its inherent value as spending. He's saying, however, that our decision to commit stimulus money -- not where to allocate it, but whether to spend it at all -- should be influenced by whether we're getting anything more than stimulus in return. There are many plausible models where a stimulus package isn't optimal if its only effect is stimulus, but where the combined benefits of stimulus and public investment together push it over the top. While Yglesias is definitely on the right track when he talks about "relaxing" our criteria for what counts as efficient or useful, he's wrong to think that this is some kind of blow to Murphy's framework, or something that Murphy doesn't consider.

One place where Murphy is genuinely off-base, however, is in his 'λ' parameter. This measures the value people place on their free time: the idea is that using lost income to estimate the impact of unemployment is an overestimate, because people can do other productive or enjoyable things in their newly free workweek. He argues that λ is "nonzero and likely to be substantial," which contributes to the conclusion that overall, stimulus probably doesn't pass a cost-benefit analysis.

Brad Delong demurs, estimating λ at 1/5 and noting that "the cyclically unemployed are not having much fun." Yes, but 1/5 is still too high, and to accurately capture the relevant issues within Murphy's framework, we'd have to make λ negative. This is because the gap between actual and potential output understates the pain of a recession, where the losses are concentrated among the unemployed. Intuitively, this is simple: $20,000 is more useful in the hands of an unemployed person, who may desperately need the money, than spread out evenly through everyone in the economy. This is why, although naive economic analyses might conclude that the welfare effect of business cycles is minor, recessions so thoroughly dominate the political landscape.

(For the economists out there, note that the fundamental problem is that Murphy's stylized model implicitly assumes linear utility of income, which is obviously an inaccurate assumption when dealing with the massive income swings induced by unemployment.)

Greg Oden: A Damn Good Basketball Player

Dear everyone claiming Greg Oden was a mistake,

Greg Oden is a damn good basketball player. Yes, as a rookie he's plagued by foul trouble, and he's struggled to stay in the game long enough to get in a rhythm. In the minutes he has played, however, he's placed himself among the most impressive rookies this season, and indeed the past decade. Consider which of the following rookie statlines, normalized by 36 minutes per game, comes from Dwight Howard, and which is from Greg Oden:
  1. 14.1 points, 11.2 rebounds, 1.8 blocks, 59.7% true shooting percentage, 17.9 PER
  2. 13.2 points, 11.1 rebounds, 1.8 blocks, 57.1% true shooting percentage, 17.2 PER
Funny how these look close to identical, don't they?* Another question: out of Kobe Bryant, Kevin Garnett, Dwight Howard, Dwyane Wade, Amare Stoudemire, Kevin Durant and Greg Oden, who had the highest rookie year player efficiency rating (PER)?**

Given my tone, hopefully you can guess the answer.

Sincerely,
Reality

* In case you're wondering, Oden is statline #1. Admittedly, Oden is two years older than Dwight Howard was as a rookie, but he only played .5 seasons of basketball in the 2 years before his rookie campaign, making it awfully hard to claim he was more experienced. Debuting in the NBA after an entire year of inaction isn't exactly easy.

** Oden, you idiot.

Bad arguments against cap-and-trade

Now that I'm back to blogging (for a night, since I'm in the mood to pretend that all my other work doesn't exist), I suppose it's time to follow through a few of the posts I've been dying to write over the last few months. In particular, I want to address Tom Laskawy's troubling and incoherent arguments against one of the most promising tools at our disposal to fight global warming: cap-and-trade. During his stint on Ezra Klein's blog -- with an otherwise impressive guest cast, I might add -- I found one post particularly bizarre. Laskawy's attempt at a argument against cap-and-trade turns out to actually be a much narrower argument against loophole-ridden European cap-and-trade that he inexplicably finds damning to the concept in general:
So it's worth taking a look at Europe, which has the only functioning cap-and-trade system currently in existence, to see what we can expect from cap-and-trade. I'm afraid it's not pretty. Central to the system, in fact the only way to reduce emissions right now, is the ability of carbon emitters (i.e. power companies) to buy "carbon credits" by paying companies in the developing world not to emit carbon. I'm sure you will be shocked, shocked to discover that these companies are gaming the system to the tune of billions of dollars in credits and very little in the way of cuts. As an AP investigative report documents, the system that's supposed to "validate" these projects, i.e. determine if a given project is truly supplanting something dirtier (e.g. building a hydroelectric dam instead of a coal-fired power plant), simply doesn't work. No one's arguing, by the way, that payments to the developing world for help with emissions cuts won't be a part of the climate solution. But it would be nice if the payments were linked to actual cuts in emissions, which is simply not the case now. So far, so bad in the world of carbon markets.
This doesn't make any sense. There are three basic ways to implement a system with emissions caps:
  1. Implement carbon emissions caps and do not allow them to be traded.
  2. Implement carbon emissions caps and allow them to be traded.
  3. Implement carbon emissions caps, allow them to be traded, and also allow them to be exchanged for "offsets" from other countries.
Laskawy's post presents some very good evidence that option (3) is currently too easy to manipulate, and shouldn't be part of our climate agenda. It offers no reason, however, to prefer (1) to (2). You don't want a cap-and-trade system to be trampled by shady offset deals? Simple: make a system that doesn't include offsets. This is hardly a reason to prevent polluters from trading emissions rights at all, which needlessly makes our carbon policy less efficient.

Gold bugs can't read

Judy Shelton takes to the pages of the Wall Street Journal to advocate a return to gold and silver currency. Her argument has all the usual hallmarks of dimwitted gold-boosting: the singleminded emphasis on inflation, the bogus claim that the government funds a large part of its budget by printing money, and wistful references to the glorious old "tradition" of hoarding precious metals. Free Exchange rightly calls it the "worst idea ever."

But there's something particularly special about Judy Shelton's article. It proves, beyond any lingering doubt, that she can't read, or at the very least is so impossibly dense that she can't understand the words on a page. She writes:
A study by two economists at the Federal Reserve Bank of Minneapolis, Arthur Rolnick and Warren Weber, concluded that gold and silver standards consistently outperform fiat standards. Analyzing data over many decades for a large sample of countries, they found that "every country in our sample experienced a higher rate of inflation in the period during which it was operating under a fiat standard than in the period during which it was operating under a commodity standard."
Now let's take a look at the abstract of the article she cites:
We examine the behavior of money, inflation, and output under fiat and commodity standards to better understand how changes in monetary policy affect economic activity. Using long-term historical data for 15 countries, we find that, under fiat standards, the growth rates of various monetary aggregates are more highly correlated with inflation and with each other than under commodity standards. Money growth, inflation, and output growth are also higher. In contrast, we do not find that money growth is more highly correlated with output growth under one standard than under the other. (Emphasis added.)
Hmm... so output growth is higher under fiat standards. And she's using this study to claim that paper money is bad for the economy? Nice.