Model Update 10/23

This is the fit of the new correlated model this map was made on Thursday. I ran the model again today. You may notice lots of changes from the other model. This is expected because what is happening is that this model better “learns” from similar states and uses the last election’s results as a starting point. There are lots of different forms of this model. I struggled to choose a single model and when this ends up in my dissertation, I’m going to be talking about multiple models.

Edit: Here is the google drive link for the daily model updates. This new model is labeled “correlated_fit” and then the date.

Scale: 0-.05 Safe Red (darkest) 0.05-0.15 Likely Red (second darkest) 0.15-0.25 Lean Red (light red) 0.25-0.75 Tossup (brown) 0.75-0.85 Lean Blue (lightest blue) 0.85-0.95 Likely Blue(second darkest) >.95 Safe Blue (darkest)

Average electoral votes: 359

95% credible interval for electoral college: 290-416

Analysis:

I think there might be a slight underestimation of the uncertainty in the electoral college outcome. I’m reading about a 99% probability Biden wins if the election was held today. I think that’s probably high but Biden should still win provided there isn’t some new crazy event. When applied to 2016 data this model read about a 60% chance for Clinton. I am not putting a lot of faith in the electoral college probability because I can’t reliably vet it using past data. It’s really hard to model the correlation between states.

This model is a polling aggregation model and not a forecast. So this fit is like if the election was held today. Since early voting is common and the election is so close this model is now predictive.

There are some things I’m a little skeptical of. I compared this model to the Economist’s model because they have some similarities. I think the estimate for Iowa is too high for Biden, although I would not rule out a Biden win in Iowa. I am wondering if the model is being overconfident in Michigan, Pennsylvania, and Wisconsin.

My 2020 Model

First up I want to be super clear this is NOT A FORECAST or a prediction of what happens on election day. This is a polling aggregation model. Think of it as a fancy Real Clear Politics average except that this model comes up with good estimates of uncertainty and is a little better at predicting the final outcome. This model only predicts well the election at the very end of the cycle. At about six weeks before the electio

This model may not be final. I am going to test a few new features on historical polling data. If they work they will be added in to the 2020 model.

I want to explain why I do what I do.

For starters, I was a little bit surprised when the Economist came out with their model. It did some things I wanted to do. I agree with most of how it is structured. But I didn’t want to basically copy them. I wanted something novel.

One interesting thing I have discovered in my research is that FiveThirtyEight’s model is not that much better at predicting election outcomes (51% for Trump, 49% for Clinton) than basic polling averages where you average the last few polls. My models from my undergraduate research were better than a polling average but not as good as FiveThirtyEight. I wanted to how accurate could a Bayesian election model be that could be run on a standard laptop in a couple of minutes.

Data Inclusion Criteria

I am using the Economist’s I assume that the results in one poll don’t affect the results in another poll. One type of election polls are tracking polls where they interview the same people multiple times. Tracking polls depend on the previous poll results so I exclude them.

The Nuance of Polling

I’m an election modeler, and my entire dissertation is focusing on analyzing public opinion polling data in one form or another. I love polling. Often on this blog or on twitter I’m cautious about a new poll or what an election model can actually tell us. So I thought perhaps I should explain why polling is important even if it may not tell us who is going to be the next President.

I feel there is an imbalance on how polling is viewed. Some approach it as being completely certain and if it is outside the margin of error it is impossible. Others dismiss polling because they can’t understand how one thousand people can tell us what the entire country thinks or that 2016 showed polling was a failure. But neither of these views is accurate.

The truth is polling remains our only rigorous and mathematically grounded tool to estimate public opinion. Elections can be forecast using economic and other data but that is only because the true proportion voting for a candidate is eventually known. But polling can tell us what percentage of individuals approve a certain policy or unravel how an individual’s policy preferences to prevent terrorism are related to their risk assessment of future terrorist attacks (as I’ve done in a recent project). We can understand how and when people’s opinions do and don’t change.

Polling isn’t a magic problem solver. The results from a poll can not be treated as 100% correct. Polling has error. Sometimes that error puts us in positions where all we know is that a race is too close to call or that the country is evenly split in its support for a policy. We have to acknowledge that margin of error won’t solve all our problems and that polling is hard work. It’s not easy to predict who is a likely voter or decide between an expensive phone poll or a larger internet panel or try to determine why someone left a question blank.

It is possible for polling to be very important because it signals to our government what the people want and that sometimes polling doesn’t give us a clear answer. It’s possible for polling to “be wrong” just by random chance. But it is also possible it gives us a clear answer. Often, it gives us something to point to as important for the government to act on in a way that is far more representative than calls to a congressman or your friend’s opinions or social media comments. If followed by leaders, polling could be a pathway for a more direct democracy without forcing every citizen to give opinions on every issue.

This election, it’s important to embrace the nuance in polling. Every poll is unique and needs to be interpreted holistically considering when and how it was conducted. Every poll on the same issue or election should have different results and that’s expected and ok. Polling is usually going to be off by a handful or two of percentage points, but sometimes the message is clear because the support is so strong or weak. But polling can give us answers when nothing else will, and for that, it will always be valuable.

How to Interpret Election Polls

This is the first of my approximately weekly posts I’m planning about the 2020 election.

As election day approaches, polls are going to become more prominent. It’s important we carefully interpret polls. I suggest you stick to focusing on polls from poll aggregators (like FiveThirtyEight or Real Clear Politics) or those tied to prominent news organizations. Polls brought up by polling experts (myself included) are typically going to be good sources. But you can encounter polls out in the wild that are complete garbage and you should be skeptical of a poll from a website you have never heard of. I’m going to briefly talk about three things you have to always consider when you analyze polls:

  1. Margin of error
  2. Polls are not predictions
  3. Outliers happen

Margin of Error

The margin of error is probably one of the most misunderstood polling concepts. The margin of error comes from a statistical formula that the natural randomness that comes from estimating a proportion for an entire population with a small sample. The margin of error is meant to be added and subtracted to a single candidate’s support. When we are talking about US elections, we normally care about the difference between the democratic and republican candidates (sometimes called the margin and written as Trump +x or Biden +y) and to examine that we must double the error. The reason for doubling the error is because in a poll with a margin of error of three points, Biden could be underestimated by three points, and Trump could be overestimated by three points, which leads to a six-point gap. The margin of error doesn’t cover the rare scenario where a respondent lies or makes a mistake. The margin of error calculation assumes the individuals who respond to a poll are not that much different than the population we are aiming to poll, and we can reach every member of that population, which isn’t exactly the case. The margin of error underestimates the polling error. It’s hard to quantify before an election how much margin of error underestimates the real error, but it is typically less than one percentage point in from my analysis I did on polling error (details will come later). If the difference between two candidates is less than double the margin of error, the poll does not provide enough information about who is winning, and this signals the race is too close to predict from just that poll.

Polls are not Predictions

Polls are not designed to predict elections. Polls are designed to estimate the percent of voters who support each candidate and what percentage of voters are undecided when they are conducted and not necessarily election day. The margin of error estimates the error between the poll and the true support for the candidates while the poll is conducted. People change their minds occasionally, and it’s hard to predict what direction undecided voters will go. A good guideline from research (taken from this book and replicated my own analysis) is that polls are not predictive of the election day result until labor day weekend. The predictiveness of polls improves over time. For example, my model will start collecting data on September 6th and will hopefully be fit by September 22nd, which is about 45 days before the election.

Outliers happen

Occasionally we will see a poll somewhere that is different from other polls. Strange poll results do happen, and you can’t say there has been a change in the race until multiple polls from different pollsters have similar findings. You should look at multiple polls when you check the state of the race. I like to look at two poll aggregators: FiveThirtyEight and RealClearPolitics. FiveThirtyEight is where I am planning to get my data from for my model. The tricky part about comparing two polls is you have to add both their margin of errors together to compare single candidates and double that number if you want to look at the difference between two candidates. Consider Poll A had Trump at 45 and Biden at 48 with a 3 point margin of error, and Poll B had Trump at 48 and Biden at 42 with a 4 point margin of error. The difference between Trump and Biden in Poll A is -3 with a margin of error of 3*2=6, and Poll B is +6 with a margin of error of 8. The margin of error to compare these polls would be 6+8=14, which means if the difference between the polls is less than 14, we would say that the polls aren’t showing statistically different results and the difference we observe could be explained by random sampling error. A lot of times, outliers aren’t really outliers after you adjust margin of error to fit your comparison.

Now you have learned some basic tools to critically analyze polls on your own to follow the races that matter to you. There are many more things you can do with polling, and the analysis I do is far more complicated than this. I’ll continue to write about polls and the election if you want to learn more.

Election Modelling isn’t Inherently Political

One of the sad trends I have noticed is the desire to attack politics journalists or pollsters or elections modelers and dismiss their work because it fits there political views.  An example of this is someone saying Nate Silver is only predicting the Democrats would flip the house because he is a Democrat, and not because there was actual evidence of this (and as you know this actually happened).  There might be journalists/pollsters/modelers who can not separate their politics and their work.  But, I assume most of them like me try put their politics aside and follow the facts.   I think it’s important for me to draw attention to this issue, and also share that as a conservative-leaning independent,  I do trust that people on the other side of the spectrum to do a good job.

It has surprised me as a newcomer to polling analysis is that how some people view the polls and models as something used to promote an agenda or attack the president. I’ve struggled with convincing some of my own friends and family that the polls could be trusted even though they didn’t predict that Trump would win the Presidential election. In some people’s minds polls aren’t worth dealing with, so you should just let the phone ring when “Survey” comes on the caller ID. 

And this disconnect between members of the public and the pollster and election modeling community is a problem. Combine this with a mediocre public understanding of probability and you get a level of mistrust in the models because they are “flawed”. I will acknowledge that all models are imperfect and they can always be improved, but we shouldn’t attack experts because their political opinion is different than ours.

 We should try to improve public support of the polls and models. Because if public trust is low the response rates may go down and with them, the errors may go up resulting in a never-ending self-fulfilling prophecy. 
  In particular, I think that the polling community does need to reach out to conservatives to attempt to try to gain a level of trust.  If there is a polling trust gap between conservatives and liberals it could affect how the polls perform.

But I trust the polls and the models. I know they have flaws, but I also know that it is the nature of all statistical modeling. But the power of political polling is more than election prediction and helps us understand how the electorate feels about politicians and policies. The challenging nature of this field and its potential for statistical education of the public is why I do what I do. 


For me, this has never been about my politics, and I trust the models of those whose political opinions and demographics are different than mine.  In this era of tribalism and polarization, we need to acknowledge that the field of political polling analysis isn’t inherently political.

Only You can Prevent Bad Political Polls

My research relies heavily on polls.  So I understand why it is important to do polls.   If I see a poll and determine it’s well written, I do it.  But I think this position is rare because people don’t know the importance of polls. I want to explain why I think polls are important.   Pre-election polls are commonly used to predict elections, and favorability polls are often used to judge a politician’s  popularity. Polls are an important part of American politics.

I get that polls are annoying.  I know it takes time and you are probably busy (like me).  But doing 1 political poll a year can greatly help improve the accuracy of polls.   You don’t have to always answer a poll, but increased participation in polls improves accuracy.   Now there are a lot of bad polls, and it’s difficult to tell if a phone poll is good based of the phone number.  Some people have “polls” that really are marketing calls.  I understand if you are hesitant to do phone polls.  But internet polling provides a good alternative.  I think the future of polling is quality internet polls.  When you do an good internet poll you know more about the quality of the poll then a poll phone call. But Internet polls from scientific polling agencies require a large base of people to create accurate samples.  You can randomly call 1000 phones, but you really can’t send 1000 random internet users a poll. To combat this problem polling agencies have databases of users to send polls. Polling agencies send surveys to certain users to create a good sample. Joining a survey panel with political polls is a way to get your voice heard.

My view on participating in political polls is you can’t complain if you don’t participate.  Polls need a diverse sample to be accurate.  If you feel your political stance is not heard in the polls, then you should do more polls instead of less.  We need all kinds of people to do good polls.  Not everyone may have internet access, but enough voters do to create a good sample.  What you can do is join a poll panel.  My two recommendations are https://today.yougov.com/ or https://www.i-say.com/.  They also do non-political polls and market research which are also important (I might do a post later on this). I recommend them because they are user friendly and statistically sound.  I am not receiving anything for recommending these agencies, I just think they are good.

If you want polls to be more accurate, the best (and easiest) thing to do is participate in polls.  As a statistician, I value good data.  But for data to be good it needs a representative sample.  Regardless of your politics, you should participate in political polls.