Only You can Prevent Bad Political Polls

My research relies heavily on polls.  So I understand why it is important to do polls.   If I see a poll and determine it’s well written, I do it.  But I think this position is rare because people don’t know the importance of polls. I want to explain why I think polls are important.   Pre-election polls are commonly used to predict elections, and favorability polls are often used to judge a politician’s  popularity. Polls are an important part of American politics.

I get that polls are annoying.  I know it takes time and you are probably busy (like me).  But doing 1 political poll a year can greatly help improve the accuracy of polls.   You don’t have to always answer a poll, but increased participation in polls improves accuracy.   Now there are a lot of bad polls, and it’s difficult to tell if a phone poll is good based of the phone number.  Some people have “polls” that really are marketing calls.  I understand if you are hesitant to do phone polls.  But internet polling provides a good alternative.  I think the future of polling is quality internet polls.  When you do an good internet poll you know more about the quality of the poll then a poll phone call. But Internet polls from scientific polling agencies require a large base of people to create accurate samples.  You can randomly call 1000 phones, but you really can’t send 1000 random internet users a poll. To combat this problem polling agencies have databases of users to send polls. Polling agencies send surveys to certain users to create a good sample. Joining a survey panel with political polls is a way to get your voice heard.

My view on participating in political polls is you can’t complain if you don’t participate.  Polls need a diverse sample to be accurate.  If you feel your political stance is not heard in the polls, then you should do more polls instead of less.  We need all kinds of people to do good polls.  Not everyone may have internet access, but enough voters do to create a good sample.  What you can do is join a poll panel.  My two recommendations are https://today.yougov.com/ or https://www.i-say.com/.  They also do non-political polls and market research which are also important (I might do a post later on this). I recommend them because they are user friendly and statistically sound.  I am not receiving anything for recommending these agencies, I just think they are good.

If you want polls to be more accurate, the best (and easiest) thing to do is participate in polls.  As a statistician, I value good data.  But for data to be good it needs a representative sample.  Regardless of your politics, you should participate in political polls.

 

A look at Alternatives to the Current Electoral College Process

First, I want to be clear, that there is no universally fair way to elect a president. All methods have pros and cons, and you can have your own opinion about which way is the best.

Current System

Right now with the exception of Nebraska and Maine, the electoral college is decided by whoever has the most support in a state.  The winner usually has a majority of votes, but sometimes no single candidate was a majority. This method also helps smaller states as they have a lower ratio of voters to electors than larger states.

Pros

This method makes it easy to determine the winner on election night.  You don’t necessarily need all the votes to come in if you have enough information to predict the winner.

Cons

Most states have a clear winner party.  So most of the attention goes to swing states who do not have a regular winner.

Popular Vote

The popular vote method is based on the winner of the popular vote.  Whoever gets the most votes wins.  This method can be implemented if enough states change their laws to award their electors to the popular vote winner.

Pros

Every vote counts the same.  Larger states would have more power than the current system.

Cons

Smaller states lose some electoral power compared to the current system.

Congressional District System

This system awards 2 electors to the state winner and 1 elector to the winner of every congressional district.  This is the method Maine and Nebraska use.

Disclaimer:  This is my personally prefered system.

Pros

It’s a compromise between the current system and the popular vote system.  The electoral college would probably mimic the congressional makeup.

Cons

Like the current system,  could elect a president that didn’t win the popular vote.

 

All of these systems have pros and cons.  There isn’t necessarily a “best” way to pick the president.

Here is a Five Thirty Eight article about different methods of deciding the electoral college.

 

If I were a Senator

Donald Trump is officially the 45th president of the United States. Next the cabinet nominees will be voted on by the senate for confirmation. Republicans have a majority, but it would take only three Republican senators to prevent the appointment of a nominee. Technically a Democrat might vote for a nominee, but considering how many Democrats aren’t participating in the inauguration no Democrats will probably vote yes on the more controversial nominees. The question is if you were a senator that doesn’t like Trump or a certain nominee should you vote for them anyway to protect your position in the senate?

This is a complicated decision given what we know about Trump’s low favorability rating. A YouGov/Economist poll asks questions about how voters view Trump and his cabinet picks. Some picks aren’t as controversial like mates who has the highest favorability among non-Trump voters in the poll. But the Secretary of State nominee Rex Tillerson and Attorney General nominee Jeff Session have the lowest favorability among non-Trump voters. You have to consider that the majority of voters didn’t vote for Trump in the election, and the majority of voters have neutral or negative opinions in most polls. This decision is difficult if you don’t like the nominees, but as a Republican senator feel obligated to support your party.

Here is what I would do if I were a Republican Senator. I don’t think most of the cabinet picks are qualified or good candidates for their positions. I know that independent and democratic and a portion of Republican voters don’t like some of the cabinet picks. Not voting for a nominee would hurt me, it would probably anger my colleagues and lower my favorability with my constituents. Not voting for my party’s nominee would probably make national news and may not be beneficial for me. However, if I vote against a nominee and it turns out they don’t get confirmed, it probably wouldn’t hurt me that much. If I run for reelection and Trump and his cabinet are unpopular my dissent could help. If I don’t vote for a cabinet candidate, but they still get confirmed I would have risked my position for nothing. This scenario is complicated and an example of a prisoner’s dilemma game (more info here). The idea in this case is voting against a candidate is only worth it if it blocks the confirmation of that candidate and it turns out that Trump is not favorable at the time of my reelection. But the payoff is higher if I vote for the nominee regardless of the actions of the 51 other Republican senators. I also know that the rejection of this nominee doesn’t mean that the next nominee going to be a better nominee. Knowing these things the only nominees that I would consider voting against being Sessions and Tillerson because those are high power positions and are unpopular enough to increase the chance that my vote would prevent their confirmation. So it wouldn’t surprise me if almost all the senate nominees get confirmed.

Coincidences: A Lesson in Expected Value

As I followed the election I noticed the frequent mentions counties (or cities) that have been known “predict” the presidential election winner. The idea is that a the winner of a certain county has matched the winner of the election for multiple elections. Let’s look at county A for an example. To simplify things lets assume the odds of predicting a winner in a presidential election are 50-50. This would mean that the probability of getting 8 elections right would be 1 in 256. This means that it is unlikely that county A would predict the election by chance. But what about the rest of the counties in America? There are over 3,000 counties in America (according to an economist article found here: http://www.economist.com/blogs/economist-explains/2016/11/economist-explains), so we can expect on average for about 12 of these counties would have “predicted” the winner of the presidential election for eight elections.

Rare events happen all the time. Rare is not impossible. Let’s say that there is a (hypothetical) free sweepstakes with a 1 in 100 chance of winning $100. It may not be likely that you specifically win, but if all your Facebook friends enter the contest someone you know is probably going to win. If you have at least 99 Facebook friends it is likely that you or someone you know will win the sweepstakes. You may think its a coincidence or luck, but it is really math. Expected value can’t tell you who is going to win, but it can tell you someone you know is likely to win. Now expected value is not a magic bullet. You may have 0 friends win or 2 friends win, but the most likely event is that someone will win. Unfortunately (legit) sweepstakes like this don’t exist, but it is a good example of how your perception of probability may not match reality. Another example is it probably going to rain 1 in 10 days where the probability of rain is 10%, but it is easy to pretend like it never rains when the probability of rain is 10%.

You may wonder why expected value matters. But it’s actually quite important when looking at everyday events. Sometimes it is easy to underestimate the chance that something odd or rare would happen. You may think it’s odd that runs when the meteorologist says the chance of that happening is 10%. Or that it only takes 23 people to have a 50% chance of there being, two people with the same birthday (details here). It is easy to forget that once in a lifetime event do happen once in a lifetime. How you think about probability is important. So before you yell at the TV meteorologist that said there was a 10% chance of rain but it rained, try to remember that unlikely does not equal impossible.

Why I am against the Recount

I don’t support Jill Stein’s call for a recount the election results in Michigan, Pennsylvania, and Wisconsin.  There are multiple reasons why I think Jill Stein is going about this the wrong way.

1. Jill Stein has no chance of winning the three states in question.

Jill Stein will not be the next president of the United States.  She got less than 2% in all of these states.  If Hillary Clinton wanted to pursue a recount in Michigan which was won by Trump by probably under 1% (right now Trump has a lead of just 11,612 votes),  I  could understand that decision, especially if Michigan was all Clinton needed to be president.  I know that the probability of 11,612 being incorrectly counted is low, but if the presidency was decided by less than 0.0001 of the votes cast I think it the results should be verified.    But I think the Clinton campaign probably considered a recount and decided it would not change the outcome so it wasn’t worth the money, time, and controversy.  But a candidate who at most got 1.1% in the states questioned shouldn’t throw a fit. I don’t think its her place to call for a recount.  I would have had the same opinion if Gary Johnson had tried a similar approach in light of a Clinton Presidency.

2.  If the election was hacked a recount would probably not catch it.

Let’s say the electronic votes were tampered.  I don’t believe this happened at all but let’s entertain the idea for a second.  If the machines were hacked they would have probably changed the record of the vote which is all that is analyzed in a simple recount.  A recount is just recounting the votes. An audit would maybe have caught it, if this hack had taken place.  But audits are expensive and a far more sensible explanation for a Trump win in Michigan, Pennsylvania, and Wisconsin is that turnout was down in urban areas where Obama got  a lot of support. I am not denying that foreign interests tried to influence the election, like the creation of fake news by teenagers in Macedonia (here is an article about that: http://www.cbsnews.com/news/fake-news-macedonia-teen-shows-how-its-done/). The lower turnout compared to 2008 probably hurt Clinton.  At the end of the day it appears that more republicans turned out to vote than democrats.     Nate Silver (a democrat) wrote about the possible hacking claims here: http://fivethirtyeight.com/features/demographics-not-hacking-explain-the-election-results/.

3.  This process isn’t helping the division in this country.

The recount isn’t going to change anything.  Trump won.  Trump may have not been the candidate you would have picked.  Personally I would have wanted any other republican candidate from the nomination process.  This recount is just making things worse.  Jill Stein should want our country to come together and accept the results.  Her own VP pick doesn’t approve of this process. Instead of escalating the situation Jill Stein should stop fighting.

The bottom line is that Donald Trump will be the next president of the United States.  All this talk about a rigged election was not backed by any evidence.  Trump shouldn’t have called the election rigged.  Stein shouldn’t do the same thing.  I would have supported a recount if we had  a 2000 situation with the election decided by 121 votes, regardless of the winner.  But given the situation a recount is unnecessary and wasteful.  I know that you may not trust Statisticians right now because we were wrong about the election, but still listen to the multiple voices speaking out against the recount.  It’s time for our country to unite and accept what happened on November 8, 2016.

Models May Fail but Statistics Matters Anyway

The 2016 presidential election brought attention to the limitations of Statistics.  Most models predicted a Clinton win but Trump will most likely be the president (the results are currently unofficial and recounts are in progress but most experts believe that Trump will be officially elected president). However all models are not 100% certain and the goal of statistics is to find the most likely event.  I have spent the last few weeks reflecting on the results and what this means for the field of political science statistics.  Recently I read a book by David Salsburg called: The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. It’s a history of sorts of how the field was developed and then applied to science.  While an exact date of the beginning of statistics is hard to pinpoint the first journals and departments were founded in the early twentieth century.  Statistics is a young field and is constantly growing and evolving as more data and situations are studied.  In the beginning some of the problems may have been trivial, but it is important to try to understand the world around us. Collecting data from an entire population is incredibly difficult and sometimes impossible, so methods of estimation were created.  You may wonder why prediction is necessary or helpful.  After all eventually the election happens and the president is chosen, so why do we care about knowing this in advance?  Why does prediction matter?  Statistics models and research is not just about what is being studied but about creating better ways to understand the world around us.   We can begin to better understand things like the opinions of the people, development of diseases,  and the economy.  Statistics can create better government, better medicine, and better education, and a better world.  If we can understand how polls measure the voting habits of the American people, then we may be able to get a better picture of citizens views on multiple issues and candidates.  If we can help understand how diseases like cancer behave, then we can create better more individualized medicine.  If we can understand how individual students learn and what they know, then we can create a better educational system.  Statistics isn’t perfect.  Statisticians can disagree and still both have valid models and reasoning.  The data may be imperfect and incomplete.  The model may be wrong.  The experiment may seem trivial and unimportant. But there is so much potential for the field of Statistics to change our world.  Just because prominent statisticians like Nate Silver may not have seen a Trump presidency as the most likely event doesn’t mean that the field should be discounted.

Statistics 101

I figured a great start would be to explain what statistics by defining the basic terms with non-mathematical language.

Statistics
What: Statistics is the study of data.
Why: To understand the world around us and try to make better decisions.

Outlier
What: An outlier is a data point that is far away from the rest of the data
Why: Outliers affect the mean and standard deviation.

Population
What: The population is the entire group of people or objects you are studying.
Why: It is important to understand your population so that you are collecting the right data.

Sample
What:  The sample is a group of individuals taken from the population.
Why:  It would be almost impossible to collect data on the entire population in most cases.  So statisticians use samples to help make decisions.

Measures of Central Tendency
What: Measures of central tendency are ways to find the middle of the data set.
Why: Statistics is about finding the most likely event and a way to do that is to find the middle.

Mean
What: The mean is the average of a set of data. It the total of the data divided by the number of data points. It is a measure of central tendency.
Why: The mean is a way to find the middle, but it can be skewed by outliers.  However, the mean is still a great way to find the middle in most situations

Median
What: The median is the data point that is in the middle of the data.
Why: The median is not affected by outliers, which makes it useful in cases with outliers like income (there are people who make hundreds of times the median income).

Measures of Variability
What: Measures of variability are ways to determine how spread the data is.
Why: Measures of variability help to compare the data and make decisions.

Range
What: The range is the difference in the smallest and largest value.
Why: The range is used to understand how spread the data is. It is affected by outliers.

Standard Deviation
What:  The standard deviation is the way of measuring the differences in the data.  It is defined by the following formula where Σ is the sum, x is the data point, and n is the number of data points.

stdev_s
Why: Standard deviation helps define the statistical distributions.

Inter-Quartile Range (IQR)
What: The Inter-quartile range is the difference in the 25th and 75th percentile.
Why: It helps find the spread in the center of the data, and isn’t affected by outliers.

Normal Distribution
What: The most commonly used distribution in statistics.
Why:  If there are enough data points all things follow the normal distributions.

Margin of Error
What: The margin of error is a way of explaining error in a sample. Samples don’t have all the information so they have error.
Why: Since samples are incomplete they don’t have all the information on the entire population.  Margin of error helps us acknowledge that the observed mean is different from the actual mean.

 

This is not the end of statistics, but these are the basic terms I will frequently use.

Welcome

My name is Brittany Alexander.  I completed an undergraduate degree in Mathematics at Texas Tech University in May 2018.  I am currently a Ph.D. student in the Statistics department at Texas A&M.   My passion is statistics and how it affects the world around us, with a focus on political science.   Currently, I am researching methods of predicting American elections and analyzing public opinion data in general.  What I have learned in my research is that people may not understand statistics and the role they play in our lives.  My goal is to educate people about basic statistical concepts like margin of error, correlation vs. causation, and why an average isn’t always the best way of finding the middle of a data set.  Data is everywhere, from what TV shows we watch, to how many steps our fitness tracker records.  I want to help you understand the world around you by explaining how you can use statistics in your daily life.  This blog is mix of posts focusing on statistical education, and data-centric political and polling analysis with some posts at the intersection of the two.  I try to use as little theory and math as possible in my explanations. My opinions are always my own,  and I am committed to transparency in my political coverage.