Wednesday, April 08, 2009

When experiments are too controlled...

As I've discovered more and more over the past few weeks, students around here are extremely fond of Dan Ariely, the behavioral economist author of Predictably Irrational, and one of our more famous faculty members. And there's nothing wrong with an interest in behavioral economics: some outstanding economists dedicated to behavioral or experimental analysis of economic behavior have made enormous strides in our understanding of the field.

But it's also important to note the weaknesses often associated with behavioral economics, and I can't think of a better example than an opinion article Dan Ariely contributed to the New York Times last November. His article, "What's the Value of a Big Bonus?", uses results from several field experiments to suggest that bonuses may not have the incentive effects businesses and economists expect:
To look at this question, three colleagues and I conducted an experiment. We presented 87 participants with an array of tasks that demanded attention, memory, concentration and creativity... 

About a third of the subjects were told they’d be given a small bonus, another third were promised a medium-level bonus, and the last third could earn a high bonus. We did this study in India, where the cost of living is relatively low so that we could pay people amounts that were substantial to them but still within our research budget...

What would you expect the results to be? When we posed this question to a group of business students, they said they expected performance to improve with the amount of the reward. But this was not what we found. The people offered medium bonuses performed no better, or worse, than those offered low bonuses. But what was most interesting was that the group offered the biggest bonus did worse than the other two groups across all the tasks.
He continues to describe a similar experiment done at MIT, along with a conclusion (and one enormous qualifier):
We found that as long as the task involved only mechanical skill, bonuses worked as would be expected: the higher the pay, the better the performance. But when we included a task that required even rudimentary cognitive skill, the outcome was the same as in the India study: the offer of a higher bonus led to poorer performance.

If our tests mimic the real world, then higher bonuses may not only cost employers more but also discourage executives from working to the best of their ability.
"If our tests mimic the real world" indeed! Although Ariely doesn't dwell on this point (which I italicize for emphasis), it's a fundamental weakness in his research. It is hard to imagine two more different situations: a lab experiment that lasts at most a few hours, and difficult financial work that lasts for months or years. Generalizing from the former to the latter isn't some small jump in reasoning: it's a massive leap of faith, with no logical or empirical basis to support it.

In fact, a little intuition suggests that Ariely's results are perfectly consistent with a world where ordinary bonuses incentivize productivity. Imagine you're in a lab, slated to participate in an experiment for the next few hours. You know that the experiment will last the same amount of time, and you'll be asked to complete the same set of tasks, no matter what you do. Even if you're at the "low" compensation level, the amount of money you'll be given is enough to justify putting in your best effort (or something close to it). After all, you're stuck in the lab anyway: why not give their stupid puzzle your best shot, when even as a "low" bonus recipient you can earn the equivalent of a low-skilled worker's daily wage?

It's likely, then, that the improvement in effort induced by larger bonuses in this experiment is pretty small. Meanwhile, there are other factors affecting your performance, like stress in a cognitively demanding challenge. Even a relatively small detrimental effect from stress may overwhelm the small improvement to effort induced by the bonus, and produce results like the ones we see in Ariely's experiment. 

This doesn't mean, of course, that we should expect to see the same pattern from all bonuses. When varying time commitments are involved, and your bonus depends on sustained dedication to work over a long period, its effect on your effort is likely to be much larger, and may swamp the negative effects from stress. I don't know that this will happen, but Ariely certainly hasn't shown that it won't, and his research does very little to advance our knowledge of incentives in the real world.

The more general philosophical issue here is the tradeoff between internal and external validity. If you're concerned about internal validity, Ariely's work is great. Small sample size notwithstanding, I have very little doubt that if I set up an identical experiment measuring the effects of bonuses on laboratory tasks in India, my results will be similar to Ariely's, and that if prodding lab subjects to perform contrived tasks ever becomes a critical policy goal, this knowledge will prove predictive and invaluable. In this limited sense, I have far more confidence in randomized economic experiments than I do in, say, the correctness of a particular regression specification.

Unfortunately, we are also concerned about external validity—whether our results extend to a more realistic setting—and here we are forced to indulge massive leaps in analysis. Near the end of his piece, Ariely argues that bankers were "too quick to discount" his results, but he never makes clear why a sane banker would base compensation policy on a few lab experiments. However unreliable financial executives' experience may be, it is much more relevant to the question at hand.

9 comments:

JPIrving said...

Indeed. I found your blog from a link on Econlog and find myself agreeing with your critique completely. A far more useful test would be one that uses data from actual firms. Certainly the bosses at real institutions opt for bonuses, in part, because they recall putting in the extra hours for the big payoff when they were subordinate.

OregonJon said...

The real challenge is that bonuses are almost always paid annually while actual results only play out over the long run. Short term rewards for long term gains or losses provides a positive incentive to game the system. Sound familiar?

TruePath said...

Worse, it seems unlikely that the stress you feel in a short term competition would be felt similarly over the course of a year long project. People simply don't process immediate and distant rewards the same way as anyone trying to diet will attest.

I suggest a better experiment would be to randomly assign telemarketers to groups which offer bonuses of varying size for the best performance.

---

More broadly I'm a big fan of the idea of behavioral economics but pretty concerned about how generalizeable simple lab experiments are to real life situations.

I mean people are social animals and will frequently sacrifice money to comply with social expectations or look good (leaving tips in distant cities). The essential problem with most experimental economics is that being in an experiment brings along it's own set of social expectations.

When I was in college my friends and I were as motivated by a desire to 'win' the econ experiment (money=points) as by pure cash. Moreover, since it was a game behaviors that might be shameful in the real world (defection) were seen as simply being a smart player.

To illustrate the point imagine you play a game with your friends where the rules allow 'theft'. Even if you have real money riding on the game you simply don't treat the in game 'theft' the same as real theft.

geoffham said...

It's also relevant that the people tested were told in advance how much their "bonus" was going to be. This doesn't reflect the real world at all: bonuses reflect the work put in over the course of the year; they aren't assigned in advance to tweak outcomes.

A worker may be goaded into working hard all year long and still be given a low bonus at the end - but that's not a mistake that they're likely to make more than once.

Bob said...

Matt,

You are onto something first brought up by Egon Brunswik in the 1930s through the 1950s in the field of psychology. Brunswik made two fundamental points. First he said that the conditions of any experiment must specify, in advance, the conditions to which they will apply. He called this "representative design" of experiments. Second, Brunswik said that the same logic that dictates that we randomly select our subjects for experiments dictates that we randomly sample the tasks we ask the subjects to perform.

Ariely's work, from what I know of it and from what you've said here, fails on both counts. Therefore, as you said, any hope of generalization beyond the lab is just that, hope.

One economist that considers himself to be a Brunsiwkian is John List at the U of Chicago. His field experiments attempt to represent the environmental conditions to which he expects his experiments to generalize.

babar ganesh said...

very nice blog, impressive.

not related to your analysis, but here's a thought.

if compensation is higher than it "should be" for a class of workers -- meaning that work with comparable skills and work demands in another situation will pay less -- then there has to be some constraint keeping compensation high.

Tilt said...

Just got to your blog by way of link from marginal revolution. Hope you decide to pick up blogging about your thoughts soon again. I really enjoyed reading your past entries.

Jirka Lahvicka, Czech Republic said...

Another Marginal Revolution visitor :-)

Good point about the applicability of Dan Ariely's research - however, his experiments are much easier than proper studies trying to replicate real conditions (these would probably take several years), so he can do more of them, challenge a lot of assumptions and generate a lot of further research ideas.

Brennan said...

Hi,

While I appreciate the sense of the comment, I think you are missing a large part of Airely's research: in all cases where it can be measured, small scale effects seem to miic large. We (humans, presumably) tend to answer small and large problems the same way. That is likely the core result of much of the behavioral economics research on irrationality. While you might expect people to behave differently on large problems, they don't!

Almost everyone knows this implicitly. Do you trust a person who lies constantly about little thigns to be truthful on major issues? Does signalling matter? (hint: yes)

So, I would be careful stating that these tests alone don't duplicate. You have ignored most of the context. It is certainly debateable, but this isn't that strong an objection.

As for bonuses, most are in contract before a beginning of the year. At least as a percentage and target. This experiment duplicated that sense very well.