The Polling Process
Usually what happens is a candidate/PAC/media outlet decides they want a poll on something so hire a pollster. Most of the time they aren’t conducting the interviews directly or doing the statistical analysis. Using polling agencies helps cut down on bias. Pollsters may also have the expertise that the buyer of the poll may not have.
The foundation of most statistical inference is randomness. So pollsters tend to take random samples. This can be done by calling random numbers (main method) or sampling individuals on internet panels. Without randomness, a poll likely isn’t representative and most statistical tools will not apply well.
Why Only You Can Prevent Bad Political Polls
Since not everyone selected actually answers a call or checks their email polls have nonresponse bias. Nonresponse bias is when the people who don’t respond have different opinions than those that did. This difference introduces error into the poll. You can help fight nonresponse by joining panels and answering survey calls.
When you look at the details of the poll that break down by group (the crosstabs) sometimes they don’t provide data on certain groups because they didn’t get enough respondents for the estimates to be good. This does not mean that they got no responses, even if it says 0% or NA. Every reputable pollster I know of is trying to reach those groups, but they are struggling. Young people aged 18-34 are one of the hardest groups to poll, but are a highly important demographic in the 2020 Democratic nomination process.
Thankfully, nonresponse can be addressed by a technique called reweighting. Thanks to data sources like the US Census we can take the responses from our sample and adjust them to be representative of the population. This doesn’t completely fix nonresponse bias, but it helps. Nonresponse is preventable if more people participate in the polling process. If everyone did just one or two polls in their adult lives we would have so much better data.
Margin of Error and Why Polling Doesn’t Always Get it Right
Then the pollster computes the margin of error. Margin of error is a statistical formula that requires the data and a confidence level (usually 95%) that tells us if we took a lot of polls about that % would contain the real mean when you take the poll result and +/- the margin of error. But this 95% number only holds for a large number of surveys, and typically there aren’t enough state-level polls for that to hold well in practice. If the difference between two options is smaller than the margin of error you can call that a statistical tie. Statistical ties mean that it is reasonable that either candidate is actually in the lead. This means in most cases it is not surprising if a few polls say a candidate will win that candidate loses on election day.
We also know in practice polls are off about two times the calculated margin of error. This increased error comes from a combination of nonresponse bias, day to day changes in people’s opinion and the occasional mistakes people make when they fill out a survey. None of those factors are fully accounted for in a basic margin of error calculation. You can’t avoid uncertainty in polling.
However, there are statistical models to combine polls to make even better predictions outside of looking at a single poll. These models are also uncertain, but they tend to do well enough to predict state-level elections roughly between within two points on average. Polls in the last few weeks before an election typically are roughly 3.5 points off of the result.
P