Yes, you read that right.  I have just calculated that there is a one in 9,432,472,254 chance that the exit polls could have been wrong by chance.

And that is not working from screen-shots, that is working from the data presented in the Edison Mitofsky report.

Not only that, but it happened in 1996 too!  A one in 268 probability of occurring by chance!

And in 1992! A one in 5007 probability of being due to chance!

And in 1998! A one in 49,827 probability of being due to chance!

Er, that's funny, what happened in 2000?  Oh, here it is -  yes, polls were wrong again! Only 1 in 3 probability this time, though.

But get this - they all made the same error!  Yes, that's right, they all over estimated the Democratic vote!  Even in 2000.

So what can we conclude from this?

What this tells us is that in every single one of the last five elections, the exit polls have consistently over-estimated the Democratic vote, significantly so (regarding 1 in 20 as "significant", an arbitrary criterion adopted by social scientists) in every year except 2000.

So the next question to ask is: was the over-estimate significantly greater in 2004?

Another preliminary way of asking the question is to say: does the degree of Democratic overe-estimate vary significantly from year to year?  We can do this using a repeated-measures ANOVA (analysis of variance), and the answer is yes: there are significant differences between years in the amount of Democratic over-estimate in the polls (probability of this being a chance finding? One in a billion, since you ask).

So we then ask: was this year significantly out of line?  So we perform a "planned comparison" and compare 2004 with all previous years.  And yes, this year was worse.  Significantly worse.  There is a one in 30,000 probability that this year would have been different just by chance.  However, if we look at the "least significant difference" i.e. compare this year with the worst of the four previous years, we find that this year was not significantly worse than 1992. 1992 was however, significantly worse than the remaining three years (2000, 1996 and 1998). Probability value for this 1992 being worse than the other three years simply by chance? 1 in 19 million.

I hope the question you are now asking yourselves (those of you who are not acquainted with the weird and whacky quirks of parametric statistics) is:

How come the Democratic overestimate 2004 was significant at a probability of 1 in nine million, but was not significantly worse than the Democratic overestimate in 1992 which was significant at a probability of only one in five thousand?

Well the partial answer is that probability values are a very poor proxy for effect size.  The probability value tells you how confident you can be that the difference was not due to chance.  It does not tell you the size of the difference.  The mean "within precinct error" (WPE - the difference between the exit poll estimate of the vote and the actual vote for each precinct) in 2004 was -6, (negative value tells you it was a Democratic over-estimate) whereas in 1992 it was -5.11.  Not a big difference. However it was more significant in 2004.

To backtrack a bit: there are two sorts of error we are concerned with.  One is sampling error.  Imagine you have a bag full of red balls and a bag full of blue balls, and you keep selecting ten at a time, at random, and chucking them back in.  Let's also say we have a scoring system whereby a red ball gives you a point, but a blue ball knocks a point off. On average you will get five of each colour, and your score will be zero.  However, sometimes you will get more of one than the other.  However, your average score, over several goes, will be 0.  However, say you have a touch of ESP, and you can sense the red colour through your fingertips (or someone has snuck extra red balls into the bag).  Sometimes you will still get 4 red ones and 6 blue ones (unless your ESP is really brilliant).  Sometimes you will get 7 red ones and 3 blue ones.  But the average over your picks will be greater than zero, because you will tend to get more red balls than blue ones.

Right.  Back to the Mitofsky Edison WPEs.  EM have provided us with the average WPE (signed) for every state, for the last five years.  Some are positive (Bush overestimate) and some are negative (Kerry overestimate).  However, the average of the averages (across all states), in every year is less than 0.  Remember, if there is only sampling error, and no sampling bias, each state's average should be close to 0, and the average of the state averages should be even closer to 0.  However, what we find is that in all the years except 2000 the average of the states WPEs is significantly less than 0.  How significant it is depends partly on the size of the signed error (the bias) but also on the size of the unsigned error (the sampling error).  The more accurate the sampling, the more significant the bias.  You could even argue significance of this year's bias may be a tribute to the accuracy of the sampling, but I'm not going to let EM get away with that.

So to summarise so far: there was Democratic overestimate each of the last five elections.  There was a significant Democratic overestimate in four out the last five elections.  In 2004 bias was significantly larger than you would expect, given the variability of the bias over the last five elections, though not significantly larger than in 1992.  And it was massively significant.

So why was there a bias?  This is the really important question, not the probability value of the bias.  There are two main contenders:

1. Bush voters were shy, and managed to avoid the inexperienced pollsters, leading to sampling bias.

2. There was no sampling bias at all, and that what was wrong was the count.  In other words, Democratic voters were not over-sampled at all, it was their votes that were undersampled.  They thought they'd voted and they hadn't.

It strikes me there is historical evidence for both, though here I am no expert, not being an American.  I think I am right in saying that spoilage rates tend to be higher in poorer and more Democratic areas.  If people think they've voted for a Democrat, but their vote is thrown out, this may well be reflected in apparent "over-sampling" of Democrats at the exit polls.  On the other hand, it doesn't explain the fluctuation from year to year, and especially the lack of apparent bias in 2000, where we know there were anomalies, most famously the Democrats who thought they'd voted for Gore on the butterfly ballot, but had actually voted for Buchanan.  So there is an argument for a built-in Democratic sampling bias, historically.

As the mean signed WPE for most states is not zero, I decided to see whether there was a relationship between state "colour" (as measured by the margin between Democratic and Republican candidate) and the direction of the WPE in each year. To do this I simply subtracted the percentage of votes cast in each state for the Republican candidate from the percentage cast for the Democrat. Blue states therefore have a positive margin, and Red states have a negative margin.  I did this for each of the years for which EM have given us the WPE.  I did it in a number of ways, and I won't bore you with the details at this stage.  Suffice it to say that when the years are pooled, there is a very strong significant negative correlation between the signed WPE and the state colour (1 in 2711 probability of occurring by chance.  This means that the bluer the state, the greater the Democratic "over-estimate" in the poll.  Moreover, the effect is strongest in the two years in which the sampling bias was greatest (2004 and 1992).  Using Spearman's rho (a non-parametric statistic that is less subject to leverage by outlying data points) for 2004, the correlation was -0.386, and for 1992 it was -.410.  The probability (if you still care) of correlation occurring by chance in 2004 was 1 in 177; in 1992, the probability was 1 in 261.

Moreover, according to EM, in these two years turnout was high (55% in both years, around 50% in the other years) and the percentage of voters "paying a lot of attention to the campaign" was also high (over 66%, in both years as compared with under 50% in the other years).  Recall that Perot was on the ballot in 1992.

What this appears to be saying is that in years in which the election has a high profile, the signed WPE has been significantly more negative (Democratic over-estimate), and that this bias has been significantly greater the more Democratic the state.   One interpretation of this is that Bush voters more inclined to avoid being polled in these years, and that they are even more inclined to do so in Blue states than Red states, which would make some psychological sense.  Another interpretation is that where the election is seen as critical, someone stuffs the ballots with Red votes.

Unfortunately, the statistics cannot tell us which is which, although the fact that 2000 comes up clean to my mind argues against the fluctuating fraud theory.  So my hunch - and this is simply hunch, not stats - is that the shy Bush voter theory has legs.

This does NOT mean it is the only factor affecting the WPE (although remember the WPE is not significantly worse this year than in 1992).  Moreover, these analyses do not rule out a bit of electronic ballot stuffing getting under the statistical radar.  There is plenty of unexplained WPE variance still to account for (in fact, reducing the noise due to this "colour" effect actually increases the significance of the Kerry over-estimate in 2004 relative to previous years).

But to get any closer, we have to look at individual precinct level data.  Unfortunately we do not have the uncorrected weights for the precinct level data that EM have released, and we only have the results EM's analyses (with no proper methods section) on the WPEs, which are presumably based on the uncorrected weights.  However, the precinct level data may still yield some gold.

I'm working on it - I hope lots of others are too.

And as a final comment - I got into my first mini-flame war on an exit poll diary earlier today.  I am not claiming special expertise, and if anyone cares to fault my stats, I am only too willing to be corrected.  But I thought it was time some of this was put in perspective, particularly the significance of significance values (which are not very significant....).  I am not a statistician or a pollster - my bachelors were in music and architecture, and I am now a "mature" PhD student working in the field of cognitive neuroscience, for which I use a lot of multivariate statistics.  I also coach stats to undergrads, especially to dyslexic undergrads.  That's all my credentials.  Oh, and I would really like to see fraud proven and Bush impeached.

Boy, would I. Update: got my stuffed votes the wrong colour: amended. Hypthesis is stuffing of Red votes. In the UK red votes are the Labour votes, blue votes the Tories. It trips me up every so often. Apologies.

#### Originally posted to Febble on Sat Apr 02, 2005 at 02:40 PM PST.

EMAIL TO A FRIEND X
You must add at least one tag to this diary before publishing it.

Add keywords that describe this diary. Separate multiple keywords with commas.
Tagging tips - Search For Tags - Browse For Tags

?

More Tagging tips:

A tag is a way to search for this diary. If someone is searching for "Barack Obama," is this a diary they'd be trying to find?

Use a person's full name, without any title. Senator Obama may become President Obama, and Michelle Obama might run for office.

If your diary covers an election or elected official, use election tags, which are generally the state abbreviation followed by the office. CA-01 is the first district House seat. CA-Sen covers both senate races. NY-GOV covers the New York governor's race.

Tags do not compound: that is, "education reform" is a completely different tag from "education". A tag like "reform" alone is probably not meaningful.

Consider if one or more of these tags fits your diary: Civil Rights, Community, Congress, Culture, Economy, Education, Elections, Energy, Environment, Health Care, International, Labor, Law, Media, Meta, National Security, Science, Transportation, or White House. If your diary is specific to a state, consider adding the state (California, Texas, etc). Keep in mind, though, that there are many wonderful and important diaries that don't fit in any of these tags. Don't worry if yours doesn't.

You can add a private note to this diary when hotlisting it:
Are you sure you want to remove this diary from your hotlist?
Are you sure you want to remove your recommendation? You can only recommend a diary once, so you will not be able to re-recommend it afterwards.
Rescue this diary, and add a note:
Are you sure you want to remove this diary from Rescue?
Choose where to republish this diary. The diary will be added to the queue for that group. Publish it from the queue to make it appear.

You must be a member of a group to use this feature.

Add a quick update to your diary without changing the diary itself:
Are you sure you want to remove this diary?
 Unpublish Diary (The diary will be removed from the site and returned to your drafts for further editing.) Delete Diary (The diary will be removed.)
Are you sure you want to save these changes to the published diary?

#### Comment Preferences

• ##### Thank you(none)
That's a nice addition to this work.
I hope that mroe data are forthcoming so that we can continue to look at this exit polling stuff. Stil, it's not just discrepant exit polling that stinks here, that's just one thing that has gone wrong.
• ##### I think it is a safe bet that(none)
the original data will not be released  before 2007 at the earliest.  Probably never. Very difficult to do anything serious if the original data are kept secret.

It would be interesting to know if it is Edison/Mitofsky that is holding back the data, or their employers (the US media consortium). I am  fairly certain such a condition/release authority would be spelled out clearly in the job contract the US media had with E/M .

• ##### It's not the data they're withholding(none)
it's the weights.  We've got the treasure chest, just not the key.

Just one little column we need.....

• ##### The real question is whether E/M holds the key(none)
at all to the full data[weighting] set, or if the key to the data is in their employers' hands.

That is another question I bet we will never get the answer to either.

Another big secret.

• ##### Flame or tip here(4.00)
but present cogent refutations elsewhere.
• ##### Dang(none)
forgot to pimp Jerome's diary European Constitution - France votes. Diary II announcement.

Light relief from Exit Polls.

• ##### Tip and reco.(none)
I'm more confused, but only because I'm stats-phobic, and my brain glosses over after a very short time.  Bottom line: hinky election shit coming to light, with mathematical proofs backing it up. Right?  That I understand.  I don't require the proof, I already believe it.  But now I can go forth and convince others.  Thank you for your help.

"Whatever they want the answer is no. Now is not the time to fold, now is the time to up the ante." -- Charles Pierce

[ Parent ]

• ##### Shame(none)
Republicans are ashamed of being such, on account it is the party of racism and class bigotry and warmongering ....but I just can't help it!

That which does not troll-rate me makes me stronger. :)

• ##### thanks so much for all your efforts(4.00)
at least one take-away ought to be that there is a significant Dem bias in exit polling that has not yet been fully explained (the 'chronic' unexplained exit poll discrepancy) and that 1992 was not a good year for pollsters. Those points have, indeed, been made before (in a Febble diary, no less!) but they are important. Exit polls have been screwed up before. Now you've added the relevent stats.

It's intriguing to think that Bush voters in Dem states are more intimidated by polling than Kerry voters in Red states.

As you have said so many times, stats don't answer why there's a discrepancy, but they do help one to know where to look.

Viva on-line peer review.

"Politics is the art of looking for trouble, finding it everywhere, diagnosing it incorrectly and applying the wrong remedies." - Groucho Marx

• ##### 2 flaws with the this Hypothesis(none)

There are a couple of points I would like to address.

1.  If I've understood you correctly, you are suggesting that Democrats are not influenced as much by the overall partisanship of the state.  However, Republicans, according to your hypothetical leanings, are LESS LIKELY TO PARTICIPATE in the exit poll THE MORE DEMOCRATIC THE STATE IS.

I believe this hypothesis is refuted by the stats given in the 2004 E/M numbers.

On page 37 of the report it shows the

• completion rate
• refusal rate
• miss rate

against the partisanship of the precinct.

Notice that there is NO SIGNIFICANT DIFFERENCE accross the table.  All of these rates are the same, whether it be a Democratic state or a Republican one.

If this above hypothesis were true, then we would expect to see a HIGHER refusal rate in the democratic states.  This is so because, as the hypothesis goes, Republicans would be less likely to refuse in a territory that they are comfortable in.  The Dems, according to the hypothesis, are consistent no matter where they are.

This chart is one of the biggest obstacles to the "Shy Republican" Hypothesis in my opinion.

2.  You have seemed to have ignored the Senate race data.

The exit poll discrepancy for the Senate race is significantly less than for the Presidential race.  The "Shy Republican" Hypothesis contradicts this data.  Ticket-splitting has not been a trend in the U.S., historically speaking.  People in the U.S. overwhelmingly vote for the same party, regardless if it is a vote for a Senator or the President.

Perhaps you can include this data in the your next analysis.

I hope these two arguments are constructive to this debate.  I hope more people like yourself continue to investigate the numbers further.  I really appreciate your work!!!

• ##### Thanks(none)
Sorry I missed this post.

If we allow the possibility that people lie to pollsters (and there is evidence that they do) it explains why the bias is not evident in response rates, and why it is inconsistent with the Senate races.

I don't think lying is the full story, or even the main story, but it is an unmeasurable factor that is perfectly plausible and may add enough noise to the data to account for the apparent inconsistencies.  But I do take these points.

Believe me, I would like to be convinced.

See my Exit Poll diary here

[ Parent ]

• ##### I would also like to see(none)
the historical data with respect to the Refusal rate against Partisanship.

The fact that there is historical similarities with the WPE / Partisanship corelation is interesting to note.

However, maybe some historical differences can be seen with the  Refusal rate against Partisanship numbers.

If there are, in fact differences with these particular numbers then that could be VERY telling.

• ##### Which means ABSOLUTELY ZIP(4.00)
Yes, 1 in 9 billion that it could happen by chance.

BUT NO ONE IS SAYING THAT IT WAS BY CHANCE

This doesn't prove fraud in any way because there are still possible explanations for why they were off besides fraud.

Presidental March Madness - Dem Style / 1st Congressional District of Tennessee

• ##### Yes, there are other possibilities but(none)
since we don't even have any working theories to test against this means a lot more than zip.
• ##### I've officially given up on this!(none)
I've been trying to make this point more times than I care to count. It seems such a sad case of disconnect - all this precious time and energy could have been put to much better use had these stats enthusiasts spend even five minutes asking themselves, What are the pollsters saying? Are they even claiming that the discrepancy between exit polls and vote tabulation could be a fluke? No, of course not. From day one, the magnitude of the effect seems to have left none of the pros in doubt that something went seriously wrong on 11/02. The question is merely, what went wrong? Was there a flaw in the way the data were collected? Or in the weighting? Or did something, ahem, "go wrong" with the voting or the vote count?

I'm guessing that the reason why people keep going out on these quixotic stats quests is because they can and because we all have a somewhat magical attitude towards numbers.

If you cannot convince them, confuse them. Harry S. Truman

[ Parent ]

• ##### I hasten to add, though,(none)
that Febble DOES address all the real questions, and DOES say that the likelihood of the discrepancy being due to chance isn't really among them.

If you cannot convince them, confuse them. Harry S. Truman

[ Parent ]

• ##### absolutely no one says(none)
the discrepancy is due to chance. No one.

The point is simply that of your possible explanations there's no way stats tell you which one is the reason.

Fraud is a possibility. So is something about the exit poll. But when posters write that they 'know' it's fraud and therefore this 'proves' what they already know, or they 'know' the exit polls are wrong and this 'proves' it, well... neither is especially supported by the data. On this site however, the 'I know it's fraud' far outnumber the'I know it's the exit polls' posts.

"Politics is the art of looking for trouble, finding it everywhere, diagnosing it incorrectly and applying the wrong remedies." - Groucho Marx

[ Parent ]

• ##### my point exactly (n/t)(none)

If you cannot convince them, confuse them. Harry S. Truman

[ Parent ]

• ##### Good neutral analysis.(4.00)
I like the "dry" tone of the analysis, that's excellent.

I dislike the tone of some of the other diaries (and comments), that basically scream "PROOF OF FRAUD!".

Your diary, instead, sticks to the facts and shows that there are unexplained effects. It will be interesting to see how they can be explained.

• ##### But what about the papal(none)
election?
• ##### Conservatives Are a Pretty Safe Bet n/t(none)

We are called to speak for the weak, for the voiceless, for victims of our nation and for those it calls enemy....--ML King, "Beyond Vietnam"

[ Parent ]

• ##### Third Theory(4.00)
Third theory: Mitofsky picked bad precincts to sample. He says he didn't, but doesn't release any precinct-level data. So we have to take his word for it that his sample precincts were well selected and correctly weighted and adjusted.

Until this data is released, the exit poll analysis will remain speculative at best. Don't count on it coming out though, because if it was dead-on there would have been no reason not to release it. The fact that Mitofsky has not made their polling process transparent suggests that there was something wrong with it, and they're trying to hide that -- putting forth the "shy bush voter" theory instead -- so they can continue to get paid to do work.

I was born in 1979, and I expect to recieve social security when I retire. Why? There is no crisis.

• ##### Well, he fairly convincingl(none)
says he didn't pick bad precincts.  I don't think he has anything to gain by lying as to where the error came from.  Bad precinct sampling and bad voter sampling are both bad.

The crux of the EM report is that EM thinks the major part of the problem (the bias) was in voter sampling.

But it would be so good to get at that data.  It is so maddening to have the data without the uncorrected weights.  I keep re-running the data set stupidly hoping to see it predict a Kerry win, and of course it just dutifully predicts the actual count. (That little anecdote might tell you how much I want the answer to be fraud....)

• ##### It won't matter(3.00)
No matter what the raw data says, it won't matter.  Some kossacks will continue to believe fraud decided the election and ignore any evidence to the contrary.  They already rely on an impossibly large conspiracy (or incredibly naieve ideas about how elections work) to support their beliefs.

Likewise, Schaivo protesters will continue to believe that she could have recovered just fine, even after the autopsy results confirm the medical diagnosis.  They'll just add the medical examiner to the to the list of those involved in the conspiracy.

People will continue to believe in a coverup at Roswell, that the face on Mars is an alien monument, that NASA faked the Apollo moon landings and that green markers will make all their CDs sound so much better.

They're True Believers, and they cannot be reasoned with.

• ##### Only Some Won't Change Position(none)
I have a burden-of-proof perspective from my sailboat racing days. To me the position that the official results reflect the Constitutional meaning of the intent of those who voted is the position carries the burden of proof.

The winning side is crooked, they're known to have tampered in the past, the incentives were exceptionally high this year, and we had just installed a large amount of unauditable voting and counting machinery. And then after election day a host of required counting procedures were violated all across Ohio at minimum.

What sane person would not put the burden of proof on that side?

Obviously we haven't run the caliber of party and campaigns to have deserved a big win. That's the fault of the party and its supporting factions, and that's the only place people like me can direct our efforts.

If & when the raw data becomes available, you'll see there are plenty of election skeptics who do accept the facts.

We are called to speak for the weak, for the voiceless, for victims of our nation and for those it calls enemy....--ML King, "Beyond Vietnam"

[ Parent ]

• ##### Evidence to the contrary?(none)
"Some kossacks will continue to believe fraud decided the election and ignore any evidence to the contrary."

Since the election was determined by Ohio, and since Blackwell has been blocking a real investigation there --  I have to ask:

"What evidence to the contrary?"

All this discussion concerns is what is the reason or cause of the unusual discrepancies between exit polls and official counts.  Also, with this analysis, there's the question of just how unusual are the discrepancies of 2004.

Myself, I guess you'll see me as in a tin-hat, but I remain SKEPTICAL yes! as to the Ohio results.

I'll even concede you a Bush victory in the national popular vote, if that would make me look more rational in your view.  But the issue, to me, is whether the electoral college vote was rigged in a decisive swing state  --  screw all the red state and blue state theorizing.  Same thing as Y2K  --  except in 2004 it was Ohio instead of Florida.

I think that is what "some" kossacks "will continue to believe" until we see "any evidence to the contrary."

Luke 17:33 - Whosoever shall seek to save his life shall lose it; and whosoever shall lose his life shall preserve it.

[ Parent ]

• ##### Focus(none)
But the issue, to me, is whether the electoral college vote was rigged in a decisive swing state  --  screw all the red state and blue state theorizing.  Same thing as Y2K  --  except in 2004 it was Ohio instead of Florida.

That's a relatively rational way of looking at it, but this nationwide exit-poll analysis stuff with the sensationalized headline about probabilities doesn't advance that story, and in fact hurts the whole cause by being so irresponsible.

Personally, I skeptical that Blackwell's malfesence really swung it in Ohio... a 160,000 vote margin is a lot harder to eek out through fraud than 537, and the documented electronic anomalies look a lot more like fuckups than hacking. Now, I think Blackwell (and Katherine Harris) should be in jail, but my sense of how things went on the ground there was that we got beat even without the added factor of fraud.

I was born in 1979, and I expect to recieve social security when I retire. Why? There is no crisis.

[ Parent ]

• ##### If it's my headline(none)
you are objecting to - apologies.  It was snark.  I was so fed up with headlines shouting probablity figures.  That's why I posted my snark-free version (and a new analysis, in which the discrepancies were even more "significant".

My whole point - and sorry you missed it - was that probability figures are a terrible proxy for effect size.  Yes, the bias in 2004 was large and massively significant (choose your p value), but the bias in 1992 was also large and massively significant.  According to my new (and I believe more valid) analyses, the discrepancy, although significantly greater than in 1992 was only significantly greater at p<.05.  But then I don't think p values are the point!

In retrospect I agree the headline was irresponsible because I somehow failed to communicate that it was ironic.  The real irony is that in my new and improved analysis, although the bias in 2004 is significant greater than in 1992, the 1992 bias was more significantly greater than zero than the 2004 bias was.  In other words, the variance in 1992 was less (significantly less, as it happens....)

Maybe the moral is don't write diaries when I'm are angry! (But I wouldn't be the first).

Read my new piece if you haven't already.  I think we agree.

See my new Exit Poll diary here

[ Parent ]

• ##### I can't speak for Outlandish Josh --(none)
-- of course, but, speaking for myself, I complimented you (Febble) for your "billions" title at the time and I would do so again.  I was L.O.L. over it!

It was (and is) obvious to me that your title was chosen in the context of the great uproar over the earlier "one-in-a-million" thing.

Wow!  Kossacks get so exercised about such snarky stuff!

BTW:  I don't think that this DKos conversation, especially with the "billions" in the title, is going to go national network or hit CNN anytime soon  --  although it may help to inform some of the things that perhaps could be heading that way.

Luke 17:33 - Whosoever shall seek to save his life shall lose it; and whosoever shall lose his life shall preserve it.

[ Parent ]

• ##### Ah(none)
FWIW I wasn't really referencing your headline, rather the headline of the report that caused all this buzz.

I was born in 1979, and I expect to recieve social security when I retire. Why? There is no crisis.

[ Parent ]

• ##### Why listen to any of this...(none)
...we are just "Frausters" remember?

"Those who cast the votes decide nothing. Those who count the votes decide everything" - Joseph Stalin

• ##### Get votes stolen, try election reform, repeat(4.00)
Rinse, lather, repeat.

Deja Vu all over again

GOP cheaters hijacked the post 2000 efforts to fix voting count problems and are doing so again (see Daily Kos :: Coordinated attack on your voting rights happening NOW!) And again right under the nose of powerful democrats.

If Kerry & powerful Dems had any balls they would be fighting this, since they didn't fight on Nov 3rd.

But noooo, Kerry is working on his bid for 2008, while timid democrats had just had their second "election reform" initiative hijacked by the GOP w/ barely a peep from our representatives.

So we will wait until 2006 to get robbed again and then come up with "voting reform" for the 3rd time... And the 4th, 5th, 6th...

Bohica: A quote for anything this administration does is "If any question why we died, tell them, because our fathers lied." - Rudyard Kipling, 1918

• ##### This is a great diary(none)
Now get back to work on your PhD! (We need more like you in the field).
• ##### Two problems....(4.00)
I have two problems with this analysis...

The first is that the results assume that a difference between the exit poll results and the counted ballots that it is regarded as an "error" in the exit polls.   We know this is not the case --- analyses of the Florida "spoiled" ballots show that the in terms of voter intent, Gore won Florida handily.  In other words, the "error" was found in the vote totals, because they did not reflect the intent of Florida voters.  Unless one factors in "voter error", one cannot get a good idea of how accurate the exit polls are.

The second is that it is highly unlikely that election fraud is national in scope.  It is unlikely that 'pro-Bush" fraud occurred in states like New Jersey in 2004 when the elections aparatus was in the hands of the Democratic Party.  It is much more likely that fraud occurred in Ohio, where the election bureaucracy was controlled by highly partisan Republicans.  In other words, any analysis aimed at determining the likelihood of fraud has to be aimed at determining the possibility of fraud in specific states --- comparisons need to be made of "battleground states" where state politics are dominanted by one party against non-battleground states.

• ##### Problems with your problems(4.00)
In the first case, Febble did not make the assumption you assume she made.

In the second case, your preferred ancillary conditions throw the case for fraud into more extreme doubt than ever, since the incidence of discrepancies was statistically similar (i.e., consistent with the same distribution) in red and blue-administered states, and in battleground and non-battleground states.

Compounding these problems, elections are almost universally administered at county (or similar) level, with many major islands of blue administration in red states.

It's also the case that a well-tempered conspiracy would probably NOT contrive to record 4,000-vote margins in precincts with only a few hundred votes ... or erroneously rotate ballot names so as to give big majorities to minor parties in isolated precincts ... or wave any number of other red flags in the face of election watchers (as has been remarkably advanced as evidence of conspiracy by those determined to find it).

All we have accomplished to date is to make marginal Democratic voters less likely to vote next cycle, and to make neutral observers less likely to listen to conspiracy talk next time regardless of the state of the evidence.

• ##### Problems with problems with problems(3.33)
1.  "Febble did not make the assumption you [bushsux] assume she made."  Did Febble discuss any state in such detail?  I don't think so.  Appears to me that bushsux is making the same point that I have made, above, that leaving out the national issues, there remain two outstanding problematic situations  --  Ohio in 2004 and Florida in 2000.

2.  bushsux said -- "It is much more likely that fraud occurred in Ohio, where the election bureaucracy was controlled by highly partisan Republicans.  In other words, any analysis aimed at determining the likelihood of fraud has to be aimed at determining the possibility of fraud in specific states --- comparisons need to be made of "battleground states" where state politics are dominanted by one party against non-battleground states."

RonK says --   "the incidence of discrepancies was statistically similar (i.e., consistent with the same distribution) in red and blue-administered states, and in battleground and non-battleground states."   But bushsux is interested in a comparison of "specific states" (mainly Ohio) with comparable states that have better counting procedures.  So your claim as to overall averages of red-administered versus blue-administered states and "safe" versus "swing" states isn't to the point at all.  So not really helpful as a response to "bushsux".

3.  RonK also says,    "Compounding these problems, elections are almost universally administered at county (or similar) level, with many major islands of blue administration in red states."

Now that is what I call an excellent example of "begging the question."

4.  RonK apparently believes that the corporate vote frauders are not only honest but also highly efficient.  Thus, RonK says,  "It's also the case that a well-tempered conspiracy would probably NOT contrive to record 4,000-vote margins in precincts with only a few hundred votes ... or erroneously rotate ballot names so as to give big majorities to minor parties in isolated precincts ... or wave any number of other red flags in the face of election watchers (as has been remarkably advanced as evidence of conspiracy by those determined to find it)."

Suppose that these corporate contractors are as faulty as the corporate contractors in Iraq  --  who have waved many red flags and produced many misleading and outrageously incorrect numbers in their reports.  Suppose that those behind the conspiracy are not as bright as they perhaps think they are.  Suppose that they are also arrogant and drunk with power.  If this line of RonK's reasoning is carried all the way, it could, if believed, defeat ANY discovery of fraud, however well-founded.

5.  RonK also opines  --  "All we have accomplished to date is to make marginal Democratic voters less likely to vote next cycle, and to make neutral observers less likely to listen to conspiracy talk next time regardless of the state of the evidence."

That last bit strikes me as ridiculous.  These discussions here at DKos will, in and of themselves,  have negligible influence upon marginal Democratic voters or their voting motivation in 2006 and 2008.  That's a silly statement of RonK's, especially considering that there may well be persuasive social/economic factors that will determine that motivation.

As for "neutral observors"  --  if there are any such, they won't be influenced by this debate one way or the other  (just because they ARE "neutral")!

IMO, the RonK comment boils down to the samo samo put-down of investigations of election rigging as "tinhat" conspiracy theory.  Not persuasive.

Luke 17:33 - Whosoever shall seek to save his life shall lose it; and whosoever shall lose his life shall preserve it.

[ Parent ]

I have not assumed that the error is in the exit polls. I have had a shot at examining evidence for and against error in both the polls and the count.  I don't think there is a definitive answer at this level of analysis.  But any analysis has to take into account that the facts that bias in 2004 was not significantly greater than the bias in 1992, and that the bias in 2000 (where people also smell rats) was insignificant.

Also agree election fraud is unlikely to be national.  However the exits, on my analysis, were not significantly worse in swing states, but in strongly Blue states, which is odd, if targetted fraud in swing states is your hypothesis.

I still think it could have happened, but we need to look at precinct level data.

• ##### While I think it is likely that fraud occurred(none)
The well has run dry here.

Exit polls point you in the right direction.

But nothing will happen unless we can prove the machines/votes were altered.

Look at the machines, not the polls, IMHO.

• ##### Not this time Kelvin... but hopefully next time!(none)
Wouldn't it be useful to know how things might have gone wrong this time - so as to prevent the same possibility in the future?
• ##### Post Election polling(none)
I noticed that Zogby has been asking how people voted for President in the monthly polls, but I haven't seen the results of said polls.  Has anyone seen what the post-election polling says about how people voted in the election?  It seems like we should be able to get an accurate count, or biased towards Bush, if we polled people now.

The only international crime is losing a war

• ##### Did you intend to say this?(none)
"Another interpretation is that where the election is seen as critical, someone stuffs the ballots with Blue votes."
• ##### No, sorry(none)
I still can't get used to the idea that Red is the conservative colour in the US!

Especially with an election coming here.

I grew up hearing The Red Flag sung at Labour Party conferences, and Thatcher was, of course, a True Blue.

I'll fix it anyway.

• ##### With all due respect(4.00)
this diary is a perfect example of missing the forest for the trees.

The forest: We have unverifiable voting machines in much of this country with more being installed every day. They are owned by republicans and their votes are counted by republicans. This is grounds for game cancelation in any game except the one that really counts, elections which determine who runs the country, who makes the laws, who picks the judges, who declares the wars.

If one football team had the goal posts on wheels and the ability to move them around, officials wouldn't be doing huge statistical studies to figure out if in fact the team that could move the goal posts had moved them. They'd confiscate the control switch, take the wheels off the goal posts and then and only then start the game.

Don't wait for statistical studies to prove or disprove there was fraud committed.  Focus on making the voting equipment fair and the elections transparant. What will happens then is that liberal, progressive ideas will win in the marketplace of ideas because they're better for people and fairer over all.

Darkness washed over the Dude...darker than a black steer's tookus on a moonlight prairie night...there was no bottom

• ##### Absolutely(none)
Whether the election was stolen or not, we know beyond a shadow of a doubt that the voting systems are not secure. Given the discrepancies in the results, and the unsecure systems, why is the Democratic leadership not making voting security a number one priority?

Even without a GOP conspiracy, I will guarantee that there are hackers out there trying to break into these systems. Why? Because they're there, and that's what hackers do, they break in just to show that they can do it. They break into banks and DoD computers just for the hell of it, and the tougher and more important the systems, the more attractive it becomes as a target. It defies credulity, with the publicity about the weaknesses of electronic voting systems, that none of them were attacked by any hackers successfully in '04.

Pipe dreams are not an exit strategy.

[ Parent ]

• ##### I recommended(none)
But maybe I made a mistake. Didn't everybody agree a long time ago that the outcome was very significantly different from chance? Is it new info that 2004 was similar to 1992?

nt
• ##### Yes, everyone agreed(none)
It is a different analysis though.

I should have put snark tags in.

The point of the diary is to say that the significance of the exit poll error is not news, or even necessarily "significant".  But interesting, if you look at the data carefully, and compare to previous years.

• ##### Hmmm, this feels phoney---too much(none)
exaggeration on the numbers, on the scope, and I truly hope this isn't an attempt to dismiss the work of real mathematicians and statisticians who have written scholarly papers after looking at all the available data.

Separation of Church and State AND Corporation

• ##### No, it's not phoney(4.00)
and it isn't an attempt to dismiss anyone's work.

If you read it carefully, you will see that.  The stats are legit.  But be careful how you interpret.  I have not reached any conclusion not reached by anyone else, except that I do not think the stats present an open-and-shut case of fraud.

And, I would argue that my status as a mathematician is at least comparable to some of those who have written the scholarly papers.  I actually contributed to some of those.

• ##### The Most Interesting Part(none)
Is that the exit polls have always overestimated the Democratic candidate's votes.  The quantification of that is interesting, especially at high error rates.  However, as stated above, until we can ensure all votes are counted and all voters can vote, it's all moot.

Ohh, and I especially like the line about "ESP for blue balls".  Heh!

Embrace diversity. Not everyone is intelligent.

• ##### A few questions:(none)
1. Where did you get your data for previous elections? To some degree it contradicts what we have been told about exit polls-that they are generally quite accurate. I want to see your data.

2. How does it compare to spoiled ballot rates for the years you looked at? If it follows what we saw in 2000, Democratic candidates had higher spoilage rates than GOP; you would therefore expect exit polls to consistently overestimate Democratic votes. Spoilage doesn't seem to explain the huge discrepancy in '04, though.

3. If a sample precinct does not predict the vote adequately for the polling model the model would be changed to adjust for the error the previous year. What caused the model to go all to hell in '04? This is especially surprising given the concerns raised by '00.

4. Although you claim that the discrepancy in '00 was the smallest in the elections you looked at, the discrepancy in Florida was quite large, where Gore was originally predicted by the exit polls to win solidly. How did you come up with a relatively close number for exit poll accuracy given the magnitude of the Florida error? Are you saying that the exit poll discrepancy that caused so much despair in 2000 was in reality the best we've had dating back to 1994? (The Florida error is explainable in terms of the massive ballot spoilage in Democratic precincts. But that doesn't mean that the exit poll discrepancy there was not large.)

5. The gross numbers you posted showing exit poll discrepancies dating back more than a decade don't really say much about how widespread it was. I'm going to make a guess here, that the much larger discrepancy in '04 was the result of a very widespread discrepancy favoring Bush, while the previous elections (if your data is accurate) showed discrepancies that were much more limited, in other words, the net number of exit poll discrepancies favoring the GOP candidate was much larger in '04. Do you have data on this?

6. How is it that others calculating the probability of the '04 exit poll discrepancy happening by chance have come up with a number of about 1,000,000/1? Still outrageously impossible, but if you apply the same order of magnitude difference to the other elections you looked at, you have to wonder whether the probablilities you claim would exist at all if calculated using the other methodology.

My guess is that the shy Bush voter syndrome does not have legs. There is actually no evidence outside of the exit poll discrepancy to support it, and what can be inferred from the data with regard to exit poll response rates tends to refute it. If there is a discrepancy in exit polls previous to 2000 my guess is that it also has to do with voting systems that were (are) designed to have a disparate impact on Democratic voters. For 2000 this was examined very thoroughly in Florida and error prone voting systems were found to be much more likely to be used in counties with large minority populations. And I don't think anybody here would believe that suppression of African-American voters started in 2000. Furthermore, when I look at the jump in exit poll discrepancy in 2004, which happens to correlate with the widespread introduction of electronic voting systems and the utter stranglehold on the arms of state and federal government that would be charged with investigating fraud in the affected states, well, it makes me think they found a new way to cheat.

It's also worth noting that simply doing an analysis of the exit poll discrepancy ignores other statistical data that is also indicative of voting irregularities, such as odd vote-splitting patterns and positive correlation of exit poll discrepancies with battleground state status. If someone is going to propose a theory (such as the shy Bush voter theory) to explain the results on Election Day 2004, they can't just pick out the exit poll observations; a theory needs to incorporate all observations, not just those that are convenient. And then you're left with:

Voting fraud.

Pipe dreams are not an exit strategy.

• ##### Election data :(none)
here.

Answer to 6: One point I was trying to make is that probability values are very unstable, and in no way serve as a proxy for effect size.  The p value get is absurdly dependent on the analysis you do.  In the good old days you simply chose a level of signficance you were happy with in advance (in in 20, 1 in 100) and the probability of a chance finding was less than that, you called it significance.

My snark seems to have gone over some heads, but the numbers are real, as are my findings.  2004 was really out of line, but really not significantly more out of line than 1992, out of all.  I have not addressed the issue of individual states in this analysis.

"One point I was trying to make is that probability values are very unstable, and in no way serve as a proxy for effect size."

Probability is an indicator of effect size. If you start flipping a coin, and it keeps coming up heads, it's the probability that causes you to start to suspect that there's something wrong with the coin, and the probability increases with each flip.

"The p value get is absurdly dependent on the analysis you do. . . "

Or perhaps dependent on the analysis absurdity.

"In the good old days you simply chose a level of signficance you were happy with in advance (in in 20, 1 in 100) and the probability of a chance finding was less than that, you called it significance."

If you apply an all-or-nothing test of significance, the exit poll discrepancy is significant, meainging not likely due to chance. It is necessary to calculate the improbability of the result as attributable to chance to help people understand how significant the exit poll discrepancy is. To take the coin analogy a step further, if you have a bunch of people flipping coins, and they're all tending toward heads, it is reasonable to start to question the general design of the coins, and conversely increasingly difficult to attribute the result to merely a coincidentally flawed set of coins.

"My snark seems to have gone over some heads, but the numbers are real, as are my findings."

You still have not answered even #6- why are your numbers out of line with others' analyses? If your analysis is designed as a snark, you'll have to forgive me if I dismiss it as absurd in the absence of a description of your methodology or an explanation for your discrepancy.

"but really not significantly more out of line than 1992."

I'm not sure why you would say a 9,432,472,254/1 probability is no more significant that a 5007/1, when by definition it is, unless you are simply saying that neither is likely to be the result of chance. The difference is meaningful, though, because it indicates a much larger number of samplings varied in the same direction.

Pipe dreams are not an exit strategy.

[ Parent ]

• ##### Agreed(none)
I agree with TrainWreck on this point. Yes, it is possible to place too much emphasis on probability values, and certainly if they are not far apart then there's little point in worrying about the difference. My personal rule of thumb is to think in terms of the log of the probability; if there's a significant difference there, I take note.

The significance of extremely low probabilities is derived from the hypothesis that they do indeed reflect some causal factor. Let's just say -- for purposes of argument only! -- that we are seriously considering the hypothesis of fraud in the vote counting process. If the 1992 measurement yields a probability of 1 on 1,000 and the 2004 measurement yields a probability of 1 in 1,000,000, then it is perfectly reasonable to say that this suggests that the degree of fraud in 2004 was greater than the degree of fraud in 1992. This isn't provable, because statistical analysis can't prove fraud in such a situation, but the explanation plausibly explains the data.

• ##### Probability is an indicator of effect size(none)
but it does not have a linear relationship with effect size.  A small change in effect size can result in a large change in probability values. This is why I say it is not a good proxy for effect size.

Perhaps the most dramatic example of this was in Baiman's analyses - a mere doubling of the "fudge factor" used to compensate for cluster sampling from 30% to a more plausible 60% reduced the probability number 30 fold,though it was still very significant.  Increasing it to 80% reduced it a further 4 fold.

Probability tells us how confident you can be of an effect. But as a number, it is very unstable - it can fluctuate wildly with small changes to data points, and the more improbable it is that your result occurred by chance, the more the sensitive the p value will be to minor changes in  methodology.

Why are my numbers out of line with others' analyses?  Because I asked a different question, which was whether the bias was significantly different from zero.  Others have asked whether a bias of the magnitude observed could have occurred by chance. I also used a simple t test.  There are arguments against using a t test, but there are also arguments against the methodologies used by others.

However, my methodology was not absurd.  It is defensible.  The point I perhaps failed to make is that methodologies can give wildly fluctuating p values, particularly when the probabilities of things occurring by chance are extremely small, even if the methodologies are arguably valid.  It's just the nature of the numbers.  I am utterly confident, as are all those involved with the exit polls that the bias observed was not due to sampling error.  It was bias.  However conservative your methodology, you cannot make it go away.  There was bias.  Whether the bias was in the polls or the count remains moot.

As to your last point, this is exactly the point I was trying to illustrate, about the nature of p values.  Ignoring for just now whether it is legitimate to use t tests for WPE data (it is not hugely illegitimate) the fact is that it is possible (as I just demonstrated) for two effects to differ enormously in their degree of significance and yet not be significantly different from each other.  It is counter-intuitive, I know, but true.  Which, is precisely why I wanted to discourage people from getting too excited about p values.

According to my t test, the mean WPE in 2004 was significantly different from 0.  It was not significantly different from the mean WPE in 1992, which itself was significantly different from zero.  If you compared the probability values for the two years you would assume they were wildly different, but they are not.  People have seen estimates of "1 in a million" for the 2004 effect and assumed that the effect is many times greater than in previous years.  But it wasn't many times greater than in 1992.

I have demonstrated that using an analysis that gives an even more remote probability for the effect in 2004, and using the same analysis for 1992, the illusion is created that the two years are massively different.  But a two-sample t test shows they are not.  I will now conduct rather more sophisticated analyses which I hope will correct for the non-linear nature of WPE data.  I don't expect my conclusions to be very different, although I am sure there will be massive changes to the p values.

1. Where did you get your data for previous elections? To some degree it contradicts what we have been told about exit polls-that they are generally quite accurate. I want to see your data.

I gave the link for the election result data for previous years above.  The WPE data is all from the EM report.  They give WPE values for the last 5 presidential elections.  If you download the EM report you can do the calcs yourself.  If you want them in spreadsheet form, if you email me, I will send you the spreadsheet.  You can even check them against the EM in case I have made any transcription errors.

2. How does it compare to spoiled ballot rates for the years you looked at? If it follows what we saw in 2000, Democratic candidates had higher spoilage rates than GOP; you would therefore expect exit polls to consistently overestimate Democratic votes. Spoilage doesn't seem to explain the huge discrepancy in '04, though.

Don't know the answer to this.  I think it is an excellent question, and I know that some of the US counts votes team are looking into it.

3. If a sample precinct does not predict the vote adequately for the polling model the model would be changed to adjust for the error the previous year. What caused the model to go all to hell in '04? This is especially surprising given the concerns raised by '00.

My suspicion at present is that whatever went wrong in 1992 also went wrong this year.  For instance, it is possible that pollster avoidance (or pollster seeking) behaviour is greater in years in which the election has a high profile, and that this is more marked in Blue states.  This appears to be borne out by my correlations.  However, it may not account for all the variance.  This is what I am currently working on.

4. Although you claim that the discrepancy in '00 was the smallest in the elections you looked at, the discrepancy in Florida was quite large, where Gore was originally predicted by the exit polls to win solidly. How did you come up with a relatively close number for exit poll accuracy given the magnitude of the Florida error? Are you saying that the exit poll discrepancy that caused so much despair in 2000 was in reality the best we've had dating back to 1994? (The Florida error is explainable in terms of the massive ballot spoilage in Democratic precincts. But that doesn't mean that the exit poll discrepancy there was not large.)

I have not discussed (as you observe) individual states in this diary.  It is a good question. I (and many others) are looking into this.

5 The gross numbers you posted showing exit poll discrepancies dating back more than a decade don't really say much about how widespread it was. I'm going to make a guess here, that the much larger discrepancy in '04 was the result of a very widespread discrepancy favoring Bush, while the previous elections (if your data is accurate) showed discrepancies that were much more limited, in other words, the net number of exit poll discrepancies favoring the GOP candidate was much larger in '04. Do you have data on this?

I don't quite understand this question.  Could you rephrase?  The data is all from the EM report.

6 How is it that others calculating the probability of the '04 exit poll discrepancy happening by chance have come up with a number of about 1,000,000/1? Still outrageously impossible, but if you apply the same order of magnitude difference to the other elections you looked at, you have to wonder whether the probablilities you claim would exist at all if calculated using the other methodology.

Good question. Because of the non-linear effect of p values, the bigger the effect, the more outrageous looking the p-values look.  I will certainly re-calculate using the other methodology.  However, the important comparison is between two.  I am pretty convinced that this is not significant by any methodology, but I will now apply highly conservative principles and if I find a significant difference I will let you know.  I am working on another diary.

• ##### A question about probabilities(none)
First a caveat: My one semester undergraduate stats class is a distant memory at this point, so I accept that I can be completely off base here.

You mention that minor changes can alter the probability of an outcome being random chance by a factor of three or four. To, say, an astronomer, a factor of three or four is insignificant. They think in terms of orders of magnitude, i.e., factors of ten. So a factor of 4 is within one order of magnitude. Two celestial bodies within an order of magnitude are considered to be the same distance away, for their purposes. Only orders of magnitude are significant as increments.

The difference between the random-chance probabilities for 1992 and 2004 is six orders of magnitude. This is a very significant difference even to people used to shrugging off millions of miles as measurement errors. Given that, it seems to be a very conservative way of measuring the significance of the magnitude of a probability.

Am I on to something? If not, let me know and I'll go make myself a tinfoil hat. I'm just trying to get a grip on the issue here.

Since we no longer have taboos, we have Abus.

[ Parent ]

• ##### Please don't take these probability(none)
values too seriously.  The are real values but I calculated them to make the point that just because two things differ from a third thing (zero) with very different probabilities of that difference occurring by chance doesn't mean that they differ from each other with an acceptable degree of certainty.

It is why for years statisticians started with an a priori alpha value (the probability value below which  they would accept that a difference was not due to chance), and simply stated whether a difference was significant or not at this level of alpha.  Most journals still require only this.  What they also require is an F or t value (the value of the tests statistic and the Degrees of Freedom).  From these, readers can look up the probability value if they are interested.  But the fact remains that comparing probability values are a very poor way of estimating whether two things differ.  We have good ways of doing this in statistics and it is not this. We also have ways of calculating effect sizes that make more sense.

But the analyses in this diary are very crude.  They were done to make a point.  I am working on a much more defensible set of numbers (and next time I will proveide t values, F values and r values with degrees of freedom, whether the result was significant at what level of alpha, and I will leave the "interested reader" to look up the p values for him/herself.)

• ##### One question(none)
Good job.  Good diary.  Just one question out of curiosity:

I have demonstrated that using an analysis that gives an even more remote probability for the effect in 2004, and using the same analysis for 1992, the illusion is created that the two years are massively different.  But a two-sample t test shows they are not.

Did you use a paired(state-by-state) t-test?

• ##### I used both(none)
and neither were significant.

The unpaired one is the important one, but the paired one wasn't valid either.

But the WPE does not bear a linear relationship with the sampling bias, and I did not compensate for this (nor do EM in their comparisons, though they discuss it in a vague sort of way) so this is very preliminary.  The point is (as I said in a previous diary) that for a given over-sampling rate, the WPE will be greater where the parties are closer.  Where one party dominates, the apparent bias will be less. The interesting thing is that the correlations I seem to be getting, with greater bias the bluer the state, is in spite of a built-in tendency for a sampling bias to be less apparent in the WPE where one party dominates.

However, everything is complicated by the presence of Perot on the ballot in 1992. It really was a three horse race.

I will post another diary when I've done some more appropriate calcs!

But the main point of this diary is that the bias in 2004 was absolutely undeniable, however you calculate it, even on a simple sign test.  However, all years except 2000 are also significantly biased (and on a sign test, even 2000 was biased).  I cannot, by any means, get a significant difference between 1992 and 2004.  I'm trying though, as I'd like to find evidence that 2004 really was worse.

• ##### reprhase(none)
5 The gross numbers you posted showing exit poll discrepancies dating back more than a decade don't really say much about how widespread it was. I'm going to make a guess here, that the much larger discrepancy in '04 was the result of a very widespread discrepancy favoring Bush, while the previous elections (if your data is accurate) showed discrepancies that were much more limited, in other words, the net number of exit poll discrepancies favoring the GOP candidate was much larger in '04. Do you have data on this?

rephrased:

I guess there would be two things that could affect the probability. One is that a relatively limited number of exit poll samples had very large discrepancies compared to the actual results; the other is that a relatively large number of exit poll samples had relatively smaller discrepancies. Do you have any info on which of these is the case that drove up the improbability value?

I couldn't find the exit poll predictions for previous elections on the web cite that you posted.

Thanks.

Pipe dreams are not an exit strategy.

[ Parent ]

• ##### Exit poll WPE values(none)
are in the EM report here, but it's a pdf so you have to cut and paste.

Several things affected the probabilities I produced, including outliers.  I am not claiming these probabilities are any more likely (considerably less likely) than some that have been published!  I just wanted to illustrate the misleading nature of probabilities as a proxy for effect size.  You are rightly calling me on this, and I will be interested in your calcs!  However, I have compared 1992 with 2004 in a number of more valid ways, and I cannot make them significantly different.  Maybe you can. However, I cannot make any year except 2000 insignificant.  When I say "make", I am not trying to cheat, but none of the analyses I have seen so far have been flawless  (including my own) and I am trying to figure out, given the paucity of data, the best way of approaching what we have.

One thing that I need to do (and I am doing as I speak) is to transform the WPEs into something that gives a comparable estimate of bias whatever the proportion of the votes for one party. Then we can do proper stats.

If I find anything interesting I'll post it here first, and you can check it.

• ##### I've figured out the transform:(none)
The WPE as I understand it is simply the difference in percentage points between the predicted margin (i.e. in those polled) and the actual margin(i.e. votes counted) in each precinct.  If I am wrong, I'd be happy to be corrected, but the EM report is extremely unclear.  However, the fact that the "explain" the reduced WPE in partisan precincts as being an artefact of the reduced scope for error would seem to confirm this.

But the one bit of hard data we have from EM is the WPE figures so those are what we have to work with (as opposed to the screenshots).

I think that we can deduce the real bias from the WPE figures using the formula: bias = log(R/D  x (1 + margin -WPE)/(1- margin + WPE)) where R is the percentage of republican votes counted and D is the percentage of Democrat votes cast and margin is tthe percentage of Democrat votes cast minus the percentage of Republican votes cast.  If I am right (and you might like to check my algebra!) that will give us a measure of bias that will be mathematically independent of partisanship - which means we can now realistically investigate the relationship between partisanship and bias, and it also gives us numbers we can do valid parametric stats with, as it will be "cleaned up" from noise due to the WPE figures being contaminated by the partisanship of the states that happened to have greater or less bias.  If you are still with me.

Anyway, I have now done this for the two critical years, 2004 and 1992 (I will also do it for the remaining years).  Very interesting.  In each year the bias is highly significant on a 1 sample t test (test value=0, 0= no bias).  Bias greater than 0 (on my formula)means over sampling of Democrat vote, less than 1 means over sampling of Republican vote.

I am still ignoring one potential confound which is that I am not allowing for different sample sizes in different states, because we do not know the sample sizes.  So that is a potential source of noise.  However, although smaller samples should produce more sampling error, they should not produce more bias.  Correct me if I am wrong.

Anyway, on these calcs, in 2004 Democrats were significantly oversampled [t(48)=8.148, p<.001], as they also were in 1992 [t(46)=8.645, p<.001].  the difference in degrees of freedom is because there is some missing data in 1992.  I have also excluded Oregon and DC which are atypical and clearly outliers.

Interestingly however, on both a paired sample t test [t(46]=2.104, p<.05] and an independent sample t test, 2004 was more biased (p<.05).  The variance was also greater in 2004, although the t value remains significant after adjusting the degrees of freedom appropriately.

So I will eat a metaphorical hat now, and say, yes, this year the bias was indeed significantly worse, although the effect size of the difference is relatively small compared to the bias itself.  As I keep saying - the trick is noise reduction!

More interestingly, having cleaned up the data in this way, I now find that there are still positive correlationships between state partisanship and bias, but - and this is the interesting bit - they are now in opposite directions.

in 2004, the greater Democratic over-sampling is found in the bluer states, while in 1992, greater Democratic over-sampling is found in the redder states.  Both these correlations are significant at p<.01, so I think they are real. The scatter plots look convincing, and they survive a non-parametric test spearman's rho.

One big difference in 1992 was the fact of Perot on the ballot, and it is perhaps not surprising that the exit polls were less accurate in such an atypical year.  We also do not know (because EM have not told us) whether in 1992 the bias was in precinct sampling or in voter sampling. It would surprising if they did not get their precincts wrong in 1992 as they would have had no precedent on which to predict the partisanship of the precincts.

So to my mind this is all in the direction of making 2004 look more unique, rather than less.

However, I am  not seeing any hint (so far) that the bias was greatest in swing states, which is what the fraud hypothesis would tend to predict.  Ohio, Pennsylvania and Florida are all close to the regression line when partisanship is used to predict bias.  There is also no hint of a quadratic function, though I will continue to look - a significant quadratic fit would indicate greater bias in the middle, i.e. in the swing states.

So if fraud, rather than over-eager Democrats/shy Republicans in Blue states is the answer, we have to explain why this should have been most apparent in states least likely to swing the election. It's not impossible, but it implies that it was a) widespread, and b) cleverly designed to be disguised as under-enthusiastic Democrats.  In other words, sysematic hack.  However, the best evidence of fraudulent activity would appear to come from the shenanigans over the Ohio recount, where it looked like fairly crude interference at county level with the tabulation.  If this happened (and I still suspect it might have done) it is not showing up in exit poll discrepancy.  But then it needn't have done.  At some stage I (or someone) should "remove" enough votes from Ohio to give it to Kerry and see if it shows up in the exit poll stats.  I don't think it would.

Sorry for long post, but here's my interim conclusion:

The exit polls do not rule out fraud in Ohio, but they do not indicate fraud in Ohio, or in PA or FL.

The exit polls are consistent with both widespread fraud in blue states (i.e. major hacking) but also with over sampling of Democrats in blue states.

This last hypothesis, much as I hate to admit it is consistent with EM finding that bias increased as a function of a number of characteristics likely to reduce the randomness of the sampling.  So if there was any inbuilt over-sampling pressure (over-eager/shy voters) you would expect this to show up more in precincts where the sampling was more difficult to randomise.

I still think we need to find either a smoking gun in Ohio or a smoking hacker with access to a large number of blue states (and why not hack the red ones?)

• ##### One Other Possible Theory(none)
I've thought about this a bit since my last post.  What would a deliberately skewed exit poll accomplish besides drawing attention to the methodology?  The only thing I can come up with is that it CAN influence voter turnout.  This is what happened when Clinton was announced the winner of his second election before the polls closed on the West Coast.

Assuming that intentional voter fraud throughout the country is difficult to achieve, the next best strategy, if one was to want the Dems to lose, is to simply tilt the exit polls to show the Dems ahead of their actual counts.  This may or may not actually depress Dem turnout, but it couldn't hurt!

I am fairly certain it is easier to effect an exit poll than a vote count.

My two cents.

Embrace diversity. Not everyone is intelligent.

• ##### Crap...Affect, not Effect (n/t)(none)

Embrace diversity. Not everyone is intelligent.

[ Parent ]

• ##### Huh?(none)
I did a double-take when I read this paragraph:

What this appears to be saying is that in years in which the election has a high profile, the signed WPE has been significantly more negative (Democratic over-estimate), and that this bias has been significantly greater the more Democratic the state.   One interpretation of this is that Bush voters more inclined to avoid being polled in these years, and that they are even more inclined to do so in Blue states than Red states, which would make some psychological sense.  Another interpretation is that where the election is seen as critical, someone stuffs the ballots with Blue votes.

Your conclusion seems backwards to me. First you say that when the election has a high profile, there's a bigger over-estimate of Democratic votes by the exit polls. That means that the vote count (not the exit poll) for Democrats is even smaller than the exit poll results in this case. Yet you conclude by suggesting that there was ballot-stuffing in favor of Democrats. Isn't that conclusion contradictory to the initial statement?

Thanks for your hard work in this analysis.

In the middle of a UK election, I said blue instead of red.  I still find it hard to get my head round the idea that conservative votes are red.

"The people's flag is ruddy red...."

• ##### No wonder Republicans are stealing htis issue...(none)
I agree with the previous comment that this is yet another in a long series of misguided attempts to ascertain who cheated and where rather than making the case for securing the system from future tampering at all levels.

Don't like exit polling?  Eliminate it.  Why, beyond network news ratings, do we even need such data?  If the principle behind such polling is to influence voter turnout (as has been suggested), then by all means, ban it.  Perhaps the original intent was to provide a check on potential fraud at the ballot box by establishing a predictive baseline to weigh against actual results, then do it and disclose the results after the election.  No more election eve polling.

Increase vote verification and provide hard copies  of all votes tabulated.  Simply put, make sure any system put into place offers some measure to not only provide a physical trail for tracking results, but which also endeavors to ensure that voters are voting correctly while still "in the booth."  What that would require would depend on the system in use (optical, touch screen, punch card,etc.), though obvious steps might include redesigning ballots to more clearly separate candidates, initiatives, and so on.

Some questions:

How do you go from a statistical aberration in the tens of thousands to one in the billions?  If that doesn't mean something is horrendeously broken, nothing does.

What evidence is there of the "shy Bush voter?"  Does this premise operate under the assumption that most Bush supporters are not of the vocal, hard right variety?  Does it presume that Bush supporters are inherently ashamed to some degree of their vote?  Should "shy" be replaced with "cautious," meaning that Bush supporters are likely to be more suspicious of pollsters and media representatives due to suspicion of "liberal bias?"  Personally, my initial reaction to this idea was to laugh.

There seems to be a prevalent misunderstanding that any voter fraud in this election occured during the actual voting process in each county and across the entire nation.  In fact, fraud would only be needed for certain select counties in key areas.  What's the point in committing fraud on any scale beyond what's necessary?  Also, the majority of any fraud would be committed before or right after the election, not during it.  Voter disenfranchisement is a well know tactic for pre election tampering.  As for after, it's not the vote you want, it's the vote count.  All optical scan and touch screen results went to central counting stations.  Control those, and you control the results.  That's quite a step down from the need to bamboozle dozens of counties in multiple states, not all with the ever helpful Ken Blackwell.

Someone argued that it would be foolish for the Republicans to commit such obvious and gross errors as the blatant vote shifting to 3rd party candidates or substantial "over votes" for Bush in some areas.  Then why the massive statistical problems?  The idea may be to muddy the waters with widely varying interpretations of the available data, or to distract from, and thereby dilute, the fundamental problem of voting transparency by creating an enormous field of red herrings.  Perhaps the gambit was a simple reliance on the American penchant for, and subsequent dismissal of, conspiracy theories and those who back them.

In multiple intstances, independent parties have been blocked from conducting legitimate recounts or from securing the matierials required for any form of validation.  At times this has meant being barred from the buildings conducting the recounts, watching as hard drives from critical systems were removed and replaced before results could be verified, and even being denied documents legally required for any investigation under state and federal laws (and even finding some of those documents in the trash).  Besides being, I would think, instances of blatant obstruction of justice, aren't these incidents grounds for criminal investigation under current law concerning fraud?  I don't think you need to produce the proverbial smoking gun, though given the hostile media climate towards liberals, I suppose you really do.

Finally, how can any statistics be taken seriously when much of the data has been filtered through the very people with a stake in the outcome?

• ##### I've got to run out but thanks for your comments(none)
One point: "shy Bush voter" is one way of putting it.  "over-eager Kerry voter" would be another.

I've seen that happen - some people looking more "pickable" than others, even when you've think you're picking at random.

I'm not a pollster, but I've done random sampling.  It's harder than you think.

• ##### I can understand some bias, but(none)
Then I have to wonder why the "shy Bush voter" wasn't countered by the "over eager Gore voter" in 2000.

I am not suggesting that the analysis of the poll discrepancies isn't correct (my math is way too rusty at this point).  Rather, I am concerned with the rather obvious gaping hole that such gargantuan numbers suggest exists in the data provided.  Some have pointed out the missing data detailing how the numbers were weighted, while others have suggested problems with under/over votes as a contributing factor.  In the end, it simply comes down to yet another example of the broad Republican strategy to restrict information to the point where nothing can be established to the point of absolute certainty, at which point they simly declare themselves the winner, take the trophy, and ship the rest of us off to Iraq.

• ##### No, it's not: You pick every nth person. n/t(none)

"...And bunnies would dance in the streets, and we would find life on Mars." -Peter Singer, Brookings Institution

[ Parent ]

• ##### But they don't(none)
See Mystery Pollster.

It's harder than you'd think.

See my new Exit Poll diary here

[ Parent ]

• ##### No, it's not.(none)

We set an interviewing rate based on how many voters we expect at your polling place. If your interviewing rate is 3, you will interview every 3rd voter that passes you. If it is 5, you will interview every 5th voter that passes you, etc. We set an interviewing rate to make sure you end up with the correct number of completed interviews over the course of the day, and to ensure that every voter has an equal chance of being interviewed.

If the targeted voter declines to participate or if you miss the voter and do not get a chance to ask him or her to participate, you should mark them as a "Refusal" or "Miss" on your Refusals and Misses Tally Sheet and start counting voters again (for a more thorough explanation of refusals and misses, refer to page 9). For example, if your interviewing rate is 3 and the 3rd "person refuses to participate", you do not interview the 4th person. Instead, start counting again to three with the next person.

Emphasis mine, for the slow-witted.

If the exit pollsters are so incompetent that they can't follow these rules, then we should dismiss all the exit poll data out of hand, shouldn't we?

What's the point of analyzing data that has no consistent measurment criterion in the first place, eh?

"...And bunnies would dance in the streets, and we would find life on Mars." -Peter Singer, Brookings Institution

[ Parent ]

• ##### Not a lot of point(none)
The rules are clear.  What is not clear is whether the rules were followed.  The fact that the bias increased with factors that were likely to increase the chances of the rules not being properly followed suggests that not following the rules was a an important causal factor.  These include: experience of interviewer; training of interviewer; distance of interviewer from the polls.

If the exit pollsters are so incompetent that they can't follow these rules, then we should dismiss all the exit poll data out of hand, shouldn't we?

What's the point of analyzing data that has no consistent measurment criterion in the first place, eh?

Quite.

And don't shoot the messenger.  I'd love the answer to be fraud, but I'm not going to see evidence where it isn't there.

See my new Exit Poll diary here

[ Parent ]

• ##### Not to nitpick...(none)
But didn't the US Count Votes report discount E/M's "naive young interviewers" factor as a contributor?  Or did I read that wrong?

"...And bunnies would dance in the streets, and we would find life on Mars." -Peter Singer, Brookings Institution

[ Parent ]

• ##### It is one of the parts of the(none)
US Counts Votes report that I disagree with.  I don't think their reasoning is quite logical on this.  I suspect inexpeienced interviewers were a substantial factor, just not the whole story.  See this link if you haven't seen it before.

EM, in their report, list factors that were associated with greater error (in Kerry's direction).  US Counts Votes argue that even where even where one of these factors is optimal, the bias is still high.  However, as EM do not provide a breakdown of what else is going on each precinct, we cannot tell how many precincts had multiple problems.  The technique we need to use to answer the question is regression analysis - EM may have done this, but they do not report it, nor do they give us the data to do it ourselves.

But by the same token we do not have the evidence to dismiss it either.  It seems perfectly plausible to me that a number of factors contributed to an over-polling of Kerry voters, i.e. non-random sampling.  The other possible factor, impossible to quantify, is that Bush voters lied.  See the diary in my sig for more.

See my new Exit Poll diary here

[ Parent ]

• ##### Febble -- recommended(none)
I was ready to recommend it on the title alone, before even opening it.  Just L.O.L. for the one in nine billion idea.

But then I am delighted to find an amazingly good explanation of the technicalities involved in this issue.  Understandably, many Kossacks reading your diary are seeking for the last word  --  whereas you have gone as far as you can without reaching that last word and even join the many who shout for access to precinct data and weights.

One thing I am sure of is that most Kossacks are tired of all the snarking. (I don't really know what that word means, but I know they're tired of it.)

Thanks for your work.  Looking forward to seeing more of it.

Luke 17:33 - Whosoever shall seek to save his life shall lose it; and whosoever shall lose his life shall preserve it.