If Beto O’Rouke Wins the Senate Tonight Here’s (Probably) Why

One of the criticisms of election predictors in 2016 was that some people felt that the risks and potential for errors were not explained.  I don’t think Beto will win the Senate but his odds right now are somewhere in the range of rolling a die and getting a 6 or rolling a die and getting either a 5 or 6 which basically says that weird things can happen but they probably won’t.  So I am writing this post so that no one can say I misrepresenting Cruz winning Texas as a sure event or was not clear about the possibility of a model/poll error, not because I am trying to hedge my bets.

But there are several factors at play that could cause the polls and my model to be wrong in Texas and other Senate races.  I trust my model and the polls,  but I know from experience that there are a few cases where the polls and my model have issues predicting a winner, and I wanted to share that those scenarios.  I don’t want to give the impression that my model is perfect and always right because it isn’t.  No statistical model is ever always right, and something as incredibly complicated as an election means that nothing is ever certain.  But I do know my model on average predicts the outcome by 2.5 points in presidential elections and calls the winner over in over 90% of the races.  I have never predicted Senate races, but I have no reason to believe this will change significantly.  Some people may wonder why I ever bother to predict something that will eventually happen,  knowing that I am going to wrong sometimes.  But I do this because it’s fun and it makes election night more exciting to have some skin in the game.

Scenario 1: Systemic Polling Error (Beto wins by 2 or more points)

Under this scenario, the polls failed to capture the enthusiasm of young and minority voters and incorrectly estimated who would turnout.  There are a lot of telephone polls, and they are probably more apt to miss Beto’s base than internet polls.  On great example of this is a New York Times / Sienna College Poll .  They struggled to get young, and minority voters and the categories were reweighted,  However, when you don’t get an accurate sample, you introduce error.  It’s not the pollsters fault for this since they have to randomly sample and they can’t make you answer.  I’ve talked about the importance of poll participation before.  Sometimes polls are wrong because they aren’t conducted properly but the vast majority of the time its because not the right group of people answered. Under this scenario, Beto would win by at least 2 points because that’s the minimum error you would need to see for the polls to be considered abnormally wrong.

Scenario 2:  Republicans Stay Home

In this scenario,  Republicans don’t turn out like they did in past elections.  The rough indicator of this is the exit poll, but it may not be detailed enough to conclude this happened.  Another proxy for this is relative turnout in the strong Republican counties versus the more urban and liberal counties. If turnout is unexpectedly weak among Republicans, this would also hurt the polls as well which were probably designed with Cruz having a turnout advantage.

Scenario 3: The Polls aren’t “Wrong” and Beto still wins by less than a point

This seems like a contradiction,  but its normal for Senate polls to be wrong by about 5 points on average.  And Cruz has slightly below a five-point lead in the polls. So Beto could win by less than a point, and the polls would still perform like they usually do. Competitive races are really hard to poll and predict because a lot of the time there will be a statistical tie.

 

2018 Prediction

This Saturday, my grandmother died.  I have decided with a heavy heart to continue to predict this election.  This project has been two years and many hours in the making.,  and I believe that my Grandma would have wanted me to continue. But given that this is a very emotional time,  I will later repeat the model in case I made a mistake.

Map with Tossups


Click the map to create your own at 270toWin.com

Map with Tossups Decided


Click the map to create your own at 270toWin.com

Overall I predict that Republicans will hold the Senate.  The polls are very close, and there might be a few surprises.  A part of me is afraid that we will see the same amount of under-capturing the support of Trump voters.  I do think a lot of pollsters have put a lot of work into building better likely voter models and weighting and they should be better, but there could be the same error we saw in 2016 that is making me a little nervous about the polls in the states Trump won with Democratic incumbents.   A lot of these competitive states are hard to poll.

I also want to represent the uncertainty in my model based on my error in the presidential model because that’s the best estimate I have of my success.    I measure my success both in terms of my predicted outcome and the actual outcome and what races I call correctly.  But since there is six toss-up states, I could be wrong about the winner but still do a very good job at predicted the outcome.  This election will come down to turnout and who is more enthusiastic about the election.

Here is the scale of uncertainty:

Safe:  Unlikely (but possible)  for the model to be wrong in predicting the winner (darkest color)

Probably Safe: It is more likely than not that the predicted winner will win. (Medium color)

To close to call: within 2.5 points or within one average error of the presidential model meaning a near statistical tie at about 68% confidence. (light color)

I have no idea:  The error is within or almost within the credible interval in my model with suggests the model is incapable of distinguishing a winner but the leader gets the seat in the final count.  (beige color in the first map, light color in the second)

Competitive Race Highlights

Here are the 11 competitive states and the predicted margins for the pooled and iterative model.  The expected error based on the presidential model data is about 2.5 points.  This doesn’t mean that I will be off by 2.5 points in all of these races. I usually get some states that are spot on with the very small error and then a few outlier states.  Numbers may not add to 100% due to rounding. R, D represent the party, and I represents incumbent.

Missouri- Hawley (R) 50.6, McCaskill (D, I) 49.4,  Margin: 1.2

Verdict:  I honestly have no idea.

The polls are really close.  FiveThirtyEight says that the fundamentals and the bias of the pollsters give McCaskill an advantage and my model doesn’t include that.  Honestly, my goal is not to predict the winner here and just hope that my prediction is close.

Nevada- Rosen (D) 51.2 ,  Heller (R,I) 48.8 , Margin 2.4

Verdict:  To close to call.

In 2012, Heller won a  point, and Clinton did carry Nevada.  I think Rosen has a slight advantage here, but turnout will determine the winner.  Democrats and independents are turning out in early voting,  but you could see an election day surge among Republicans,  and we only know the party the voters were from and not the actual votes.

Florida- Nelson 50.2 (D,I),  Scott 49.9 (R) Margin 0.3

Verdict:  I have no idea who will win.

All I know about this race is that is incredibly close, and Nelson might benefit from the excitement over the Democratic Governor candidate Gillum.

Arizona- Sinema (D) 51.5,  McSally (R,I) 48.5, margin 2

Verdict:  To close to call.

This is another one of these races where it comes down to turnout.

Texas: Cruz 52.8 (R,I), O’Rouke (D)  47.2, Margin: 5.6,

Verdict: Probably safe for Cruz

In my home state of Texas, I predict a Cruz win with a margin of 5.6%.  Based on my historical presidential error this would mean Cruz has about a 95% chance of winning, but my gut suggests that the polls may not have captured the enthusiasm among first time and young voters, so maybe its closer to 66% chance for Cruz.

Tennessee: Blackburn: 51.1 (R),   Bredesen (D) 48.9  Margin: 2.2

Verdict: To close to call with more than 68% certainty

The model thought this was more of a toss-up than I did,  but it wouldn’t be surprising for either candidate to win.  Turnout is probably key here.

North Dakota: Cramer (R) 54.4, Heitkamp (D, I) 45.6, Margin 8.8

Verdict: Relatively safe for Cramer

The North Dakota polling is a little sparse and Heitkamp could surprise us, but I doubt it.

Montana: Tester (D, I) 52.3, Rosendale (R) 47.7, Margin: 4.6

Verdict: Probably Safe

I would not be surprised if polling overly favors Democrats in the heavily red states because Trump still trashes the polls and the media so I completely wouldn’t reject the possibility of a repeat the surprise of 2016 in Michigan, Pennsylvania, and Wisconsin, but I ultimately think Tester should win.

Indiana: Donnelly (D, I) 51, Braun (R) 49, Margin 2

Verdict: To Close to Call

The model thought this race was closer than I thought it would.  There has been a lot of last-minute polling in October where Braun began to edge closer.  I wasn’t expecting this race as competitive as it was until this week. If Braun wins this would be not surprising for me.

West Virginia: Manchin (D,I) 54.4,  Morrisey 45.6, Margin 8.8

Verdict:  Probably Safe for Manchin

This race was a lot less competitive than I expected, but I guess West Virginians like Manchin.  My model always struggled with West Virginia in presidential elections so if Morrisey would it wouldn’t be that surprising.

Details

This election I have five different groups.  To be considered competitive a race must have two polls where the margin is smaller than the margin of error.  The red group contains both Mississippi races, Utah, Wyoming, and Nebraska.  Wyoming has no polls so I will use Utah’s polls.  The blue West group contains Washington, California, New Mexico, Wisconsin, Michigan,  Hawaii, and both Minnesota races. The blue east group contains Maine, Vermont, New York, New Jersey, Ohio, Virginia, Delaware, Maryland, Massachusetts, Connecticut, Pennsylvania.

I split up the races with two or more polls were the leader in the poll was ahead by less than the margin of error.  I then group the states into red-leaning, blue-leaning and toss-up states based on how I viewed the race.  The competitive red-leaners are Texas, Tennessee, North Dakota. The competitive blue-leaners are Montana, Indiana,   West Virginia. The tossups are Missouri, Nevada,  Arizona, and Florida.

And lastly the special cases.  Hawaii and Wyoming have no polls, so my prediction is just the prior average. California has two Democrats, so there I just averaged the polls there.  In Maine and Vermont, I treat the independent senators as Democrats in my model since they caucus with the Democrats and there isn’t a viable Democratic candidate in those states.

What My Model Does and Doesn’t Do and Why

I want to explain what my model does and doesn’t do.  This model came from my undergraduate research I did at Texas Tech that was financially supported by the Undergraduate Research Scholars program.  I built the 2016 model in about two months during my second year, and then post-election I spent time analyzing and writing the draft of the first paper on the model and started a project on voter behavior that got abandoned in the fall of my third and final year.  I then decided to revamp the model by altering the structure of how the model worked and compared different methods.

This whole project has always been growing as I have grown as a statistician.  But since it takes me a lot of time to build a model,  it’s always lagged behind my abilities.   I’ll admit that there are some assumptions that are not ideal and that the current model right now is not the best way to do this.   It can be better.  But I have always carefully considered the effects of the unideal assumptions in my model.  I may not have communicated this well in 2016,  but I did know that my model could be wrong.

I will technically be running about 12 models for research purposes, but my main two models consist of a model that pools the polls together and one that iterative updates based on new polls.  Both calculate what is essentially a fancy weighted average between the polls in other similar states and the polls from that state.  The iterative model converges much more quickly to the latest poll, and that is the one I tend to favor,  and the other model gets the mean and variance of the polls and does the weighted average once.

My model does not adjust polls for bias or weight polls based on quality and when they were conducted.  These changes will be implemented in 2020,  but I haven’t had the time to do it for this election.  I’ve never predicted Senate elections, but my track record on Presidential elections is incredibly similar to the major models, and this model has gone under peer review.  I can’t say for sure that my model will work, but I’m hopeful that it will hold its one on Tuesday compared to other models.

Election Night Guide

I wanted to give some advice on the following election results on Tuesday.

There are two things to keep in mind:  poll closing times and how results come out.

Different states close their polls at different times.  The “standard” closing time is 7 pm local time,  but some states contain two time zones or have extended polling hours.  So the control of the House and Senate will likely not be decided until 1-2 hours after the competitive states close which means about 8 pm PST time or 11 pm EST.

For the Texas senate race, I would be watching the smallest 200 counties that makeup about 20% of the vote, mainly because we don’t know about turnout in these places and a lot of these counties should vote strongly for Cruz.  We are seeing strong turnout in the more urban and more liberal districts,  but if turnout is good in the more conservative areas (and it is in the larger conservative counties), Cruz will probably win.  Obviously, not everyone in Austin will vote for O’Rouke, and not everyone in Lubbock will vote for Cruz, but we should see a similar partisan map on Tuesday as in past elections.   I do agree there are a lot of young and first-time voters voting in this election which is a good sign for O’Rouke,  but there are also a lot conservative young people in Texas so this is not necessarily a sign that O’Rouke will win.

 

The Polls Might be Wrong on Tuesday, but Here’s Why Thats Ok.

I’m going to preface this by saying I am writing this on the Friday before the election.  I don’t know if the polls are going to be wrong on Tuesday,  but I want to be proactive.  After 2016,  I learned that there were people who didn’t understand the uncertainty about polling and election models.  I also watched the attacks on many of the leaders of my field for their alleged partisan bias that caused them to underestimate Trump.  I can’t speak for other people political motivations, but the models are polls built using sound statistically methodology.

The fact is that polls have uncertainty.  They can be wrong and sometimes will be wrong for a few reasons.  Polls have huge nonresponse rates,  for example in the New York Times live polls you can see that usually only 2-4% of the people called answer.  And since those that don’t answer can be different than those who answer the polls the polls can be biased based on nonresponse.  Nonresponse could be easily fixed if more people answered their political calls or completed online surveys that they are chosen to participate.

Secondly, the structure of polls relies on assuming that individual people favor the candidates at similar rates and that one voter is not affected by other voters.  This assumption is for convenience because if you don’t have this assumption is practically impossible to estimate a margin of error.  So polls usually make this assumption,  which that an interpretation of 95% of all polls contain the real result in the margin of error is an overestimation of the certainty.

A heuristic I like to use is doubling the margin of error because that roughly represents the true error of polls.  One thing you will notice in this election is that a lot of the polls are close.  This means that we can not be sure who runs in quite a lot of races.  In the Senate, about four races (ND, NV, MO, FL) are to close for the polls to predict the winner with a high degree of certainty.

I expect that my model will have an average error of about 3-4 points.   Some of the error is going to come from bad estimates in noncompetitive races with limited polling,  but in the competitive races, I should be off (hopefully) by 2-3 points.  Which means it would not be surprising for me to incorrectly call 2 to 3 races,  but on the other hand I could be completely right or miss four races and not be surprised.

Election prediction is an inexact science,  and while pollsters try our best, since elections have uncertainty,   we will be wrong sometimes.  But for me at least,  I predict because I love the challenge and trying to make sense of a complicated event.  I will be wrong sometimes,  but when I’m right its a great feeling to have defied the uncertainty that makes my job difficult.