Georgia: fabulous effort; statistical quibbles

by Febble

Community

(This content is not subject to review by Daily Kos staff prior to publication.)

Monday, Jan. 03, 2005 Monday, Jan. 03, 2005 at 9:23:38am PST

I started this as a comment on Georgia's fantastic effort, but it got a bit long-winded, so I've posted it as a diary.

Most of her evidence looks to me to be utterly damning of the electoral process. But being a stats nerd, I got hung up on the stats, and I don't think the exit poll arguments are up to the watertight standards of the rest of the paper. So, for what it's worth, here are a few specific quibbles, mostly with Freeman's very gung ho probability values.

With statistics, in general, you want to make the most conservative assumptions you can get away with to make your argument. I've done this where I could. More below:

Let me say first of all, I still think that the exit polls are odd, and consistent with "irregularities" in the election itself. But statistics are slippery things, so here some advocacy from the devil:

(p 29):

But we are not living in a healthy democracy. And the evidence below demonstrates that it is nearly impossible the exit polls were "dead wrong."
The reason this is so is that traditionally exit polls have been close to 2% accurate. Yet in the last three elections, 2000, 2002 and 2004 they haven't been.

Some counter-evidence here

(p 30):

Traditionally, exit polls are conducted until about 6 p.m. But four hours after the polls closed, the exit poll numbers were still changing. There is also the disturbing fact that, not only were the exit poll numbers still changing after midnight, they were actually changing in mathematically impossible ways.

Yes, these changes are mathematically impossible. We can therefore include that the later numbers reflected new information (not just new responses) that caused the responses to be re-weighted. This could have been information regarding turnout, or actual returns from precincts.

However, this need not be "disturbing" in itself. The purpose of the exit polls is to predict the eventual result. The more extra information becomes available, the more the actual response data needs to be recalibrated in the light of the new information. There is no point in the exit polls still predicting a Kerry win when it is blindingly obvious that the counts are have gone for Bush.

The key question is not why did the exit poll numbers change (they ought to converge with the actual results) but why did they have to change? In other words, the question as to why they changed is coterminous with the question as to why the exit poll responses, before the re-weighting, were discrepant from actual count (I will sit on the fence so far as to call it the "count" not the "vote").

This may be calling "fire" in the wrong part of the theatre.

Yet the polls also showed Republicans carrying most of the tight Senate races. When the official votes were tallied, the early Senate numbers for the large part matched up the actual totals. Thus, the presidential exit polls proved wrong while the Senate polls proved right.

Really interesting point.

(p 31):

What are the chances that there would be so much red shift in so many battleground states? 662,000-to-one: "As much as we can say in social science that something is impossible, impossible that the discrepancies between predicted and actual vote counts in the three critical battleground states of the 2004 election could have been due to chance or random error."

Freeman's probability estimates are simply wrong. Not wrong because he miscalculated them, nor wrong because he persists in using a low "fudge factor" for a constant for the "design effect" where others recommend a higher one (and whichever is right, it is best to be conservative in these cases, i.e. go for the highest one unless you can support the lower one, which he doesn't), but because the each of the unknowns in his computation has its own "probability" value, which he doesn't allow for. His conclusion is almost certainly right - that the early exit polls over-estimated Kerry's share of the vote - but his probabilities are too vulnerable to be quoted as such. My own estimate is that the probability of the early exit polls nationwide being as different from the actual count by chance is 1 in 200. Which is good enough for me. One or the other was definitely wrong.

(p 32):

[Freeman] find the odds against the discrepancies in Ohio, Florida, and Pennsylvania occurring together are computed at 662,000-to-one, or a virtual statistical impossibility that they could have been due to chance or random error. His study has yet to be substantially undermined by any opposing arguments.

Testing the a priori null hypothesis that exit poll error in the three battleground states would not be significantly different from elsewhere in the nation, my estimate is that the probability of exit polls in Ohio, Florida, and Pennsylvania being that far from the national mean error by chance is about 1 in 3000 on a parametric test, and 1 in 600 on a non-parametric test. Again, good enough for me, but there seems no point in over-stating the case, which Freeman undoubtedly does.

[Freeman's] study has yet to be substantially undermined by any opposing arguments.

See numerous pages here for statistical methodology issues. Again, I'm not saying Freeman's conclusions are wrong, but his probability computations are sitting ducks, and more conservative computations still make his point. To quote from the admirably skeptical Mystery Pollster :

To be clear: Everyone -- including the exit pollsters -- agrees they "overstated" Kerry's vote. There is some argument about the precise degree of certainty of that overstatement, but if all agree that the difference is statistically significant the degree of certainty has little consequence. The size of the error matters, and the reasons for it matter, but whether our level of confidence about the error's existence is 99.9% or something greater does not.

(p 33):

Were Bush voters, for some reason, more likely to brush off exit pollsters? There has not been a scintilla of evidence to support this argument.

Except that it happens. It happened in the UK when John Major beat Neil Kinnock - now pollsters routinely allow for "shy Tories". I have a theory that it happens when people are frightened, and vote with their guts rather than their heads. But then report who they would have liked to have voted for, rather than who they had actually voted for. And they were certainly frightened this year - deliberately. Sorry, this isn't a statistical point, just a view.

Furthermore, why would Bush supporters in say, in Illinois, where the exit polls matched, be more likely to talk to exit pollsters than Bush supporters in Ohio?

Because of the normal distribution of error. The discrepancy can be interpreted as showing that nationwide, there was a sampling bias in favour of people who said they'd voted for Kerry. No one state was outside the margin of error - but the error was significantly biased to Kerry. To use an analogy I've used elsewhere: if obstetricians generally overestimated a baby's due date, more babies would be born "late" than "early". But you wouldn't turn round and say "well, how come this baby was born on the right day and that baby wasn't?" In other words, a bias can still be within a margin of error. You might only know the obstetrician had got it wrong by examining 1000 pregnancies (or 50 states).

Another interesting tidbit about the exit polls: "In the 12 critical states (CO, FL, MI, MN, NE, NV, NH, NM, OH, PA, WI, & IA) the average discrepancy was a 2.5% red shift (= total movement of 5.0%), nearly twice that in the safe states. This in spite of the fact that the average sample size in the critical states was nearly twice that in the non-critical states and should have produced significantly more accurate results."

I just checked this. On a parametric test, the exit polls were significantly more pro-Kerry in these 12 states than in the rest of the nation, but only significant at probability of 1 in 20. It is not significant on a more conservative non-parametric test. Moreover, if New Hampshire is excluded, neither analysis is significant, and we know that a full recount in NH matched the original count, not the exits, ergo, the exits were wrong in NH. So we shouldn't let NH leverage this analysis.

Until we know how the exit polls were weighted it will be difficult to know why the discrepancies occurred, and whether the error was likely to be on the poll side or the count side. Until then, to summarize:

There was a definite discrepancy between the early exit polls and the final count. No question.
There was a tendency for the key states to have bigger Kerry-wise errors in the exit polls (or Bush-wise errors in the counts) - but the statistical evidence is weaker.
No one state had exit polls outside the margin of error.

Having said all that, the exit poll evidence remains consistent with Georgia's case.