Why the Claims of Election Fraud aren’t based on Statistical Evidence

I’ve been following all the lawsuits trying to overturn the election results in states that Biden won. I’m far from a legal expert so I don’t pay much attention to the legalese but I do like to examine the “evidence” of “irregularities” in the election. I’m highly disappointed in what qualifies as an expert or evidence in most of these lawsuits. It is shocking to see people who claim to be experts make statements that occasionally could be refuted by an undergraduate who took a stats class and a political science class.

If you are going to suggest that our democracy was violated and that there was a massive conspiracy to rig the election you need to have solid evidence. That is a life-altering claim to the health of our democracy. It’s a serious allegation that needs serious evidence. You need to provide an analysis created by someone that has genuine expertise in both political science and statistics. To do a high-quality analysis you probably have to be on the level of a statistics Ph.D. student in the last part of their studies. You need methods commonly used in the literature for similar problems. And you need to approach the analysis knowing that statistical tests aren’t the best equipped to detect fraud and rule out more innocent explanations of the irregularities.

But as I’ve examined the “evidence” of voter fraud I’ve been highly disappointed in the rigor of the arguments. Voter fraud is rare, and the level of voter fraud necessary to change the presidential outcome has to thousands of times higher than normal levels. At best the conclusion of these analyses is that mail-in votes counted later did not come from the same distribution as the in-person votes or that the vote change from 2016. But everyone should have known that because the method of voting was highly partisan and mail-in votes take longer to process. And every election year there are commonly shifts from previous results. There isn’t evidence of massive fraud but people including President Trump are acting like there is. These are highly damaging allegations that are not based on facts.

Now I will break down what’s wrong with the analysis in six lawsuits. These lawsuits were filed by Republicans. I’ve addressed all of this on Twitter, but I wanted to summarize and show that there are many cases with many mistakes.

The Texas Expert who would have failed Stats 101

The recent Texas lawsuit (expert starts on pg. 22) says Michigan, Wisconsin, Georgia, and Pennsylvania had unfair elections and the electors chosen shouldn’t count. The expert has a Ph.D. in economics. Some economists are great at political science and statistics, but this expert doesn’t seem to be good at either. He claimed that the odds of Biden outperforming Clinton was 1 in a quadrillion. But he used a test that doesn’t apply because it assumes that you have a sample from ideally less than 10% of the population and that the votes are counted in random order and all the ballots have the same probability of being for Biden. Additionally, he used the wrong formula. And this test was something I taught undergraduate students in the most basic stats class at Texas A&M. I had final exam questions testing the ability to perform this procedure and this work would deserve a failing grade. I am struggling to express how horrible this analysis is. The analysis doesn’t make a clear conclusion other than these results are unexpected and the expert was used in the lawsuit to show that it is practically impossible for Trump to have lost. But the analysis can’t be used to make that conclusion even if it was appropriate and done correctly. This test can only detect differences in populations but it can’t explain why the difference occurred or if the difference was fraud.

The Michigan analysis that Used Minnesota Data

In another lawsuit, an expert claimed that there were more votes than people in certain precincts. This would obviously be a red flag and if true would likely show fraud. But these calculations combined voting results from Michigan using population data from Minnesota. So the analysis was completely wrong.

The Arizona Lawsuit that said audits by Precinct would be better than Audits by voting center

Another lawsuit in AZ claimed that the audit needed to be done by Precinct instead of by voting center to be accurate. But how you do a sample like that doesn’t have a large effect and sorting the data to Precincts from voting centers would be highly time-consuming. There wasn’t a need to do an audit by Precinct from a statistical standpoint

The Pennslyvania Lawsuit that wanted to assume illegal votes had the same distribution as legal ones

A lawsuit in PA wanted to take a sample of the mail-in ballot envelopes, examine them to see if they were fraudulent, and then assume Biden won the same percentage of illegal votes as legal ones. This is a bad idea because how can you know what the distribution of illegal votes is? Additionally, any analysis of the ballots would likely label some legal ballots as illegal. It’s hard to identify illegal votes unless you have a double vote, someone stealing a ballot, or lying about their eligibility. A signature mismatch isn’t always going to be fraud. You must consider that your analysis would have sampling error if not every ballot was examined. So you need more complex analysis and a highly designed sample. So a fishing expedition to find enough invalid ballots to change the winner is a bad idea.

The Nevada Lawsuit that said if ballots are rejected by the machine at high rates the machine must be missing illegal votes

A lawsuit in Nevada mentioned that the signature verification machine rejected a lot of ballots. Those ballots were reviewed by humans and accepted most of the time. The lawsuit claimed that since the machine rejected a lot of ballots it must have accepted multiple fraudulent ballots. But the machine rejecting a legitimate ballot is a unique process than accepting a fraudulent one. You can’t immediately conclude that the machine accepted fraudulent ballots. The relationship between managing false negatives and false positives often is an inverse relationship. So if this machine was rejecting ballots, those it accepted were likely signature matches.

The Georgia Lawsuit that said you had more voters than eligible people

A lawsuit in GA claimed there were registered voters than eligible voters and that there are many people who voted only for Biden. But the analysis that found “extra” voters used census data without accounting for a the relatively large margin of error on county-level estimate of eligible voters. Basically in a state like GA that has automatic registration, and therefore high registration the census data is often imprecise enough to show over 100% registration. The census data is also pulled over years and known to struggle in areas rapidly growing in population. Also turnout in these counties was much lower than the eligible voters estimates. There might be some extra people on the rolls, but they aren’t voting, and the problem is likely exaggerated in this analysis. Also someone else did really bad math that assumed Biden overperforming the Democratic senate candidate meant people only voted for President and no other candidates. But in reality this was just split ticket voting which is common especially among Republicans in the Trump era.

So in summary, lots of the statistical claims that there is evidence of voter fraud are statistically unsound. I fully believe that the results are correct, and that no massive fraud occurred.

Pre-Election Analysis

I’m having trouble embedding my map. Here is the link.

Announcement: Now you can view my twitter feed directly on the site! So if you don’t have twitter you can now keep up with me as I live tweet tomorrow.

I’m going to summarize what I think about the model and break down the key states.

This model can’t really be that influenced by my opinion. I decided to code everything in terms of the Democratic candidate because the Democratic candidate won two out of the three elections I tested this model on. My predictions for Trump are just 1 – Biden’s prediction.

This is an non-interactive map for reference.

The median outcome (50th percentile) is 358 electors for Joe Biden.

Trump won 306 electors in 2016. To be re-elected, President Trump can’t have a net loss of more than 36 electors.

Overall, I trust the model’s output. The model is objective. There are very few parameters I chose subjectively and those mainly affect the uncertainty. The model starts out with assuming that the results will match 2016. Once the polling data comes in it focuses on the polling data over the past results. The model does “punish” large deviations from the past election. It’s going to be skeptical of large changes from 2016 to 2020. This is by design because polls tend to have more variation than election results.

One key thing that could go wrong is that if we have a scenario of 2016 where polling errors are skewed to towards one candidate the uncertainty estimates will be wrong. Honestly, it’s a 50-50 shot for Biden or Trump to be underestimated. And the level of error you would have to see from this model for Trump to win is about 1-2 points higher than in 2016. I would say it’s equally likely that Trump wins or Biden gets 400 electoral votes.

I’m a little skeptical of Iowa and Ohio flipping, and part of the reason they are tossups is because of the strong polling leads in Michigan and Wisconsin. It’s going to be a long night for watching the Midwest results.

I think Biden does have a significant advantage in the south over Clinton. Biden is up 1.5 points over Clinton’s prediction from the model. Arizona, Georgia, North Carolina, and Texas also have signs of a shift towards the Democrats. I think the model is being realistic about the change given the polling data. The predictions for Arizona, Georgia, and Texas were pulled towards Trump because of the 2016 results.

My model doesn’t predict the electors in Nebraska and Maine that are decided by congressional districts. My personal predictions are Nebraska-2 and Maine-2 will vote Democratic. Maine should be won by Biden, and Nebraska should be won by Trump.

One key thing to remember is the pandemic may slow counting down. We have more mail in ballots to process, in some states they can’t count the ballots until election day. This article gives a good explanation state by state about counting and when to expect results. It may not be possible to call every state on election night. It may not be necessary to call every state to determine the presidential winner. If it’s a landslide either way you may not need results everywhere because a candidate hit the magic 270 number. In particular, Pennsylvania and Nevada may not be called on election night. If we don’t know who won the 26 electors from Pennsylvania and Nevada, we may not know who won.

Do not react to early results. To call a race we need both mail in and in person votes that represent the entire state. It may take a while to have the information to call states. You might have temporarily leads for Trump and Biden that don’t hold. It appears that the method of voting depends on partisanship. So the early votes may not have the same partisan break down of the mail in votes or the election day votes. So just remember to breathe and wait. Honestly, if you are just a causal observer who hasn’t watched a live election night before don’t start looking at states until an hour after the polls close.

Now the breakdown of the key states. I am predicting only the vote for Biden and Trump. I ignore third party candidates because they have no chance and are hard to predict. I’ve going to mention the average error of this model for 2008, 2012, 2016 and compare it to the race’s projection. That’s what I mean when I say the model was within x points. The model could underestimate Trump, but it could also underestimate Biden. It is very possible to have above-average error, but if the lead is bigger than the average error, the candidate is more likely to win than lose. Also don’t take the decimal point in the model’s prediction that seriously. It’s just there for context. The model can only predict to within about one to two points of the actual result on average.

Arizona: I’m expecting a delay in results. I do think it is plausible for Arizona to flip given that the 2018 senate race was won by a Democrat. Arizona shifts a lot and the polling isn’t always accurate. On average, the model was within 3.7 points of the outcome in Arizona. Biden is predicted to get 52.2% of the vote. Arizona leans Democratic but is still uncertain.

Colorado: I don’t think Colorado is a battleground state for the presidency anymore. The population of Colorado has shifted in ways that make it more of a likely democratic state than the tossup it was in the past. On average, the model was within 1.8 points of the outcome. Biden is predicted to get 56.5% of the vote.

Georgia: Turnout is high for early voting and the 2018 senate race was competitive. It is probably a pure tossup. On average, the model was within 0.6 points of the outcome. Biden is predicted to get 50.2% of the vote.

Florida: Things look worse for Biden than they did for Clinton. This model put Clinton at 50.6%, but Biden is at .52. Florida usually has highly accurate polls and this model was on average within a point of the election result in Florida for 2008, 2012, 2016. This is the state I’m watching the most. If Biden can win Florida, he can probably win the presidency.

Iowa: Iowa is likely a complete tossup. It’s definitely not a state likely to predict the overall winner especially since it only has six electors. On average, the model was within 2.7 points of the outcome. Biden is predicted to win 50% of the vote.

Michigan: My guess was that Trump winning Michigan in 2016 was a fluke. I expect the 2020 results to look more like 2008 or 2012 than 2016.Biden visited more and invested more than Clinton. Turnout was low in 2016, but it looks like 2020 turnout will be higher. On average, the model was within 2.6 points of the outcome. Biden is predicted to win

North Carolina: This state seems to be trending a little towards Biden. On average, the model was within 1.4 points of the outcome. Biden is predicted to have 51.6% of the vote. This is a tossup that leans Democratic.

New Hampshire: I expect New Hampshire to be less competitive than 2016. I think it’s highly likely Biden wins New Hampshire. On average the model is within 1.6 points of the outcome. Biden is predicted to have 54.7% of the vote.

Nevada: Nevada should be safe. I’m unsure if they will have all the mail in ballots counted. Wait for results from Clark County (home of Las Vegas and most of the voters) before you make any judgements. On average, the model is within 2 points of the outcome. Biden is predicted to have 54.3% of the vote.

Ohio: Ohio is a tossup. I don’t buy the theory that Ohio is going to be a bell-weather state that predicts the outcome. Trump needs Ohio to win, but Biden could win without it. On average, the model is within 2 points of the outcome. Biden is predicted to have 50.2% of the vote.

Pennsylvania: Pennsylvania doesn’t count mail in ballots until at least election day. We may have no idea what the result is in Pennsylvania for hours or even days. Take the early returns with a huge grain of salt until the networks call it. I think that Trump winning in Pennsylvania in 2016 was a fluke. I expect the 2020 results to look more like 2008 or 2012 than 2016. On average, the model was within 1.5 points of the outcome. Biden is predicted to have 53.6% of the vote.

Texas: I debated about talking about Texas, but I decided to because I’m a Texan. You have to be very careful with the early results. You need to see results from the entire state. There were times when Beto O’Rouke was leading the senate race, but he didn’t win at the end. I’ll be talking a lot about Texas on Twitter because I have a good grasp of the political geography of the state. I’m very unsure about what the record turnout means, but if I had to guess the non-2016 voters would lean at least slightly more democratic. Texas is in play. We don’t have data on the partisanship of the early vote since you don’t register by party. On average, the model was within 1.1 points of the outcome. Biden is predicted to have 48.7% of the vote. I would consider Texas a plausible state to flip if Biden has a really good night. But I would not be surprised if Trump wins by 3 points, even though that is a significant loss compared to his 9 point victory in 2016.

Wisconsin: Like Michigan and Pennsylvania, I think Trump winning Wisconsin was a fluke. I expect the 2020 results to look more like 2008 or 2012 than 2016. I’ve read reporting (in the 538 article I shared) that we may find out Wisconsin results early Wednesday morning. The overall winner may or may not be called by then. Don’t freak out again about early results. On average, the model was within 2.2 points of the outcome. Biden is predicted to have 54.2% of the vote.

And it’s a wrap! Happy election eve y’all! Make sure you vote tomorrow if you haven’t already. My election night live tweeting fest will start at 7 PM tomorrow (and you can follow it on the blog!). I will be tweeting off and on until then. I’ll do a follow-up post once the results have settled and we know who won. I’ll do a more detailed post in December went the results are final.

Model Update 10-30

Today I am introducing a new map.

My attempts to embed the map have failed so I’m just going to link to it here.

Mean: Average predicted support for Biden in that state

Standard Deviation: A measure of the uncertainty of the model.

95% Credible Interval: There is a 95% probability according to my model that the true support for Biden will be in this interval.

Probability Biden wins: The estimated probability Biden wins from my model

Electors: Number of Electors in the Electoral College in this state

Now I’m going to bring back the old map because I like how it adds up the electors. The median number of electors for Biden is 357.

Edit: Map had some mistakes

Analysis:

The race continues to look the same. Texas and Arizona seem to be trending a little more towards Biden than a week ago. Texas surpassed it’s 2016 turnout which probably is another sign Texas is competitive, although I think Trump is the slight favorite. I don’t see a single state that voted for Clinton that Trump has a decent chance of winning.

I did an experiment on what happens if the model as wrong as it is in the past. If the model underestimates Biden, Biden clearly wins in a landslide. But even if the model underestimates Trump at 2016 levels, the model still gives an average of 303 electors for Biden.

I think that maybe there is a 10-20% today that Trump wins. Now on election day that number would be more like 10%. This probability isn’t coming from my model. My model says there is a 99% probability Biden wins but that will only hold if my assumptions hold. So I think there is a 10% chance my assumptions fail and a 10% chance that a major event happens before election day. I’m concerned about some court cases that could invalidate lots of mail in ballots or a major revelation about Joe Biden. I also wonder if the high early turnout is going to increase election day turnout because people may realize that the race is close. While Democrats seemed to do better in early voting, the knowledge of the high early turnout could make more Republicans want to turn out too. 

Model Update 10/23

This is the fit of the new correlated model this map was made on Thursday. I ran the model again today. You may notice lots of changes from the other model. This is expected because what is happening is that this model better “learns” from similar states and uses the last election’s results as a starting point. There are lots of different forms of this model. I struggled to choose a single model and when this ends up in my dissertation, I’m going to be talking about multiple models.

Edit: Here is the google drive link for the daily model updates. This new model is labeled “correlated_fit” and then the date.

Scale: 0-.05 Safe Red (darkest) 0.05-0.15 Likely Red (second darkest) 0.15-0.25 Lean Red (light red) 0.25-0.75 Tossup (brown) 0.75-0.85 Lean Blue (lightest blue) 0.85-0.95 Likely Blue(second darkest) >.95 Safe Blue (darkest)

Average electoral votes: 359

95% credible interval for electoral college: 290-416

Analysis:

I think there might be a slight underestimation of the uncertainty in the electoral college outcome. I’m reading about a 99% probability Biden wins if the election was held today. I think that’s probably high but Biden should still win provided there isn’t some new crazy event. When applied to 2016 data this model read about a 60% chance for Clinton. I am not putting a lot of faith in the electoral college probability because I can’t reliably vet it using past data. It’s really hard to model the correlation between states.

This model is a polling aggregation model and not a forecast. So this fit is like if the election was held today. Since early voting is common and the election is so close this model is now predictive.

There are some things I’m a little skeptical of. I compared this model to the Economist’s model because they have some similarities. I think the estimate for Iowa is too high for Biden, although I would not rule out a Biden win in Iowa. I am wondering if the model is being overconfident in Michigan, Pennsylvania, and Wisconsin.

Model Update 10/15

I am writing this week’s update a day early.

So I’ve developed a functional second model that allows for correlation between states. I am working on testing it. Next week I might base the post on the second model if I show that the second model is more accurate than the current model.

This week’s map:

Scale: 0-.05 Safe Red (darkest) 0.05-0.15 Likely Red (second darkest) 0.15-0.25 Lean Red (light red) 0.25-0.75 Tossup 0.75-0.85 Lean Blue (lightest blue) 0.85-0.95 Likely Blue(second darkest) >.95 Safe Blue (darkest)

Biden/Trump is likely to lose about 1 in 4 of there lean states and 1 in 10 likely states. The expected number of electors is Biden 335, Trump 203. This adjusts for the uncertainty in winning a lean or likely state. Except for Texas, the likely and lean red states are labeled because of insufficient polling data. CA, WA, OR are likely blue because insufficient polling data.

Analysis:

Not much change in the model. The model is very stable. The number of polls are increasing which is nice. I think Biden remains the likely winner.

Model Update 10/9

This weeks map:

Scale: 0-.05 Safe Red (darkest) 0.05-0.15 Likely Red (second darkest) 0.15-0.25 Lean Red (light red) 0.25-0.75 Tossup 0.75-0.85 Lean Blue (lightest blue) 0.85-0.95 Likely Blue(second darkest) >.95 Safe Blue (darkest)

Biden/Trump is likely to lose about 1 in 4 of there lean states and 1 in 10 likely states. The expected number of electors is Biden 339, Trump 199. This adjusts for the uncertainty in winning a lean or likely state. Except for Texas, the likely and lean red states are labeled because of insufficient polling data.

Analysis:

Some of the change for this week is due to me fine-tuning some model parameters and adding weights. Biden continues to be ahead.

Now let’s compare Biden’s lead to how accurate this model was in the past. If Biden’s lead is greater than the historical error this indicates a high probability Biden wins that state. The direction of the error could underestimate or overestimate Biden. Historically this model is equally likely to underestimate or overestimate a candidate. It’s equally likely Biden and Trump are underestimated. This comparison uses the models fit 28 days before the election in 2008-2016. We roughly assume that if Biden’s lead is bigger than the error he wins and the other states are split among the candidates.

Some interesting comparisons in the key states of AZ, CO, GA, FL, IA, NC, NH, NV, OH, PA, VA, WI :

If the model’s performance at 2008 levels, Biden’s lead is larger than the error in: IA, PA

In this scenario, Biden is likely to win but it is close. He should still pick up about half of the other key states since the error can go both ways.

If the model’s performance is at 2012 levels, Biden’s lead is larger than the error in: AZ,CO, FL, IA, NC, NH, NV, OH, PA, VA, WI

In this scenario Biden wins every state is he classified as likely to win. This leans to a blowout of approximately 374 electoral college votes.

If the model’s performance is at 2016 levels: , Biden’s lead is larger than the error in: AZ, CO, FL, GA, NV, IA, VA

In this scenario, Biden is likely to win.

If the model’s performance at the average compared to 2008-2016:

AZ, CO, FL, GA, IA, NC, PA, VA

In this scenario, Biden is likely to win.

Basically the lead we are seeing for Biden surpasses the historical error of this model. Now it is possible for the model to perform worse this year. But this along with the uncertainty estimates from within the model paints a picture that Biden is the likely winner. Trump has a chance, but it is smaller than Biden’s.