Pennsylvania & Nevada Race Profiles

I prewrote this post was written on 6/25, but I am going to briefly comment on the news of Justice Kennedy’s retirement. I’m on a mission trip so I can’t do a full post on the news yet.  Now Trump could get a supreme court nominee through before the seats change next January after the 2018 election, but there could be significant resistance from Democrats to against confirming the new justice until the midterm elections, and maybe even  attempting to block a very conservative justice even after the election.  Now the Republican Party has to hold their ground to protect the Supreme Court  and ideally gain a seat or two that would be loyal to President Trump, and vote for his nominee since I could see Senators McCain, Collins, Murkowski, or Graham possibly voting no if the candidate is too conservative.   And if Democrats win the senate in November, I would not be surprised if they refuse to nominate a conservative judge.  I wasn’t expecting the Supreme Court to be a major issue in the campaign, but it just became one.

Pennsylvania:

2016 Presidential Election result:  Trump: 48.18%,  Clinton 47.46%, Margin: Trump +0.72%

2012 Senate Election result: Casey (D) 53.7%,  Smith (R) 44.6%

Democratic Candidate: Bob Casey (Incumbent)

Casey is the incumbent senator.  He has served two terms and held various political positions prior to his election as Senator.  His campaign websites mention his support for improving infrastructure and the renegotiation of trade deals,  similar to Trump’s positions.  In my opinion, he is presenting his view on the issues as a moderate Democrat.

Republican Candidate: Lou Barletta

Barletta is currently a US congressman and was formerly a businessman and mayor of Hazleton.  Barletta’s issues page wasn’t very detailed, but it seemed like typical conservative Republican positions.  Barletta’s campaign site contains an article attacking Casey’s political history and other attacks on Casey’s voting record.  This isn’t surprising and is something that numerous politicians on both sides of the aisle have down,  but I haven’t seen that many directly negative campaigns thus far.

My Thoughts on the Race:

Two polls show Casey has a big lead, and Pennsylvania is a purple/blue state.  I don’t know if I should really even consider this seat competitive.  I don’t want to ignore Pennsylvania like it was ignored in 2016,  but I don’t want to be too cautious.  I think I need to revisit my categorizations and add a “probably safe for the incumbent but anything can happen in the Trump era” subgroup.

Nevada:

2016 Presidential Election result:  Trump: 46.1%,  Clinton 48.2%, Margin: Clinton +2.1%

2012 Senate Election result: Heller (R) 45.9%,  Berkley (D) 44.7%

Democratic Candidate: Jacky Rosen

Rosen is currently a US representative and was previously a computer programmer.  She appears to be an establishment Democrat.

 

Republican Candidate: Dean Heller (Incumbent)

Heller is the incumbent senator.  He has served one full term.  His website mentions his support for rural voters during his time in the Senate, which is strategetic.  He is definitely appealing to Trump’s supporters and more moderate Republicans.

My Thoughts on the Race: With only 1 poll , conducted before the primary,  showing a statistical tie,  the Nevada race looks to be close.  Unlike most of the other races I have examined thus far,  the Nevada race involves a vulnerable Republican.  Given Nevada’s large Hispanic population, and slight Democratic lean,  I think this seat is the most likely to flip parties.  The Texas seat is also vulnerable,  but Nevada is more liberal-leaning than Texas.   Trump did do a fundraiser for Heller but that could backfire by encouraging Rosen’s supporters to donate more.  Nevada is definitely a race worth watching.

Competitive Race Updates

I wanted to mention a few polls and discuss the recent immigration controversy over child separations, and the race in Pennsylvania.

I use http://fivethirtyeight.com and http://realclearpolitics.com to get my polling information.

Two Ohio polls got released on 6/13 that show the Brown ( the Democratic incumbent) is ahead outside of the margin of error with a 16 and 17 point lead.  This means that the races are currently very safe for Brown.  This race is unlikely to flip but I’ll keep my eye out for new polls.

A new poll in West Virginia  shows that Manchin (the Democratic incumbent) with a 6 point lead which is outside of the margin of error.  This combined with other two polls  out right now suggests Manchin is probably leading.

Florida hasn’t had a primary yet,  but two polls in May showed the presumed Republican nominee, Governor Rick Scott, with a slight lead.  In one case the poll was outside the margin of error,  but given the historical errors of early Senate polls,  this is far from definitive.  This is interesting,  but it is to early to tell what this will mean in November.  Incumbents have an advantage,  and Trump is unpopular and that might affect turnout.

The controversy over the child separations of immigrants caught entering illegally might have an effect on the some of the Senate races.  Trump is not on the ballot,  but his (un)popularity might affect turnout which I think is going be very important in this election.    Trump did end the policy,  but it could still affect the election.

In particular, Beto O’Rouke,  the Democratic candidate for Senate in Texas,  may have benefited from controversy.  O’Rouke was on a few news shows and called for the policy to end.  O’Rouke also used social media to discuss his views, which also get him more attention.  As a Texas native,  I did see a lot more activity than usual supporting O’Rouke among my friends on my social media feed.   This is a very biased sample that doesn’t reflect the entire Texas voting population,  but it does signal to me that something may be changing.  Hopefully, another poll will come out in Texas soon.

 

 

 

Ohio Race Profile

This post was written on June 10th and may not reflect last minute changes.

2016 Presidential Election result:  Trump: 51.69%,  Clinton 43.56%, Margin: Trump +8.13%

2012 Senate Election result: Brown (D) 50.7%,  Mandel (R) 44.7%

Democratic Candidate: Sherrod Brown

Brown is the incumbent.  He was first elected to the Senate in 2006, and before that he was a U.S. representative and had various state level positions.  He has been highly productive in producing new bills, including bipartisan ones.  My impression of Brown is that he is a moderate Democrat.  His opinion on social issues like abortion and LGBT+ rights may be too liberal for some voters in Ohio.

Republican Candidate: Jim Renacci

Renacci is currently a member of the US House of Representatives, but it wasn’t in his biography which I thought was odd.  Before he was a politician he was a businessman, which seems to be a pattern among Republican candidates in the Trump era.  His views appear to be reminiscent of the 2016 Republican platform.

My Thoughts on the Race:

Brown is the incumbent and the one poll we have had Brown winning by 14 points.   This is one of the races I am watching,  but I don’t think it will flip.  Trump didn’t win by that much in Ohio.  I would consider Ohio a purple state.  Brown has done well in his past elections.  I don’t think a candidate that is similar to Trump is enough to beat an incumbent when Trump isn’t that popular.

I think it is important to watch all the states where there is a mismatch in the Senator’s party and the 2016 Presidential Election result.   I don’t know how much movement there will be.  This new project has been a big learning experience for me.  Control of the Senate is decided by so many unique races.  It’s interesting to learn about the current and future Senators and get a look at a different side of voting behavior.  I have enjoyed watching primary night coverage and learning more about American political geography.

West Virginia Update

Over the past few days,  a controversy has erupted over West Virginia Senator Joe Manchin’s supposed opinion on the border wall.  In January, Manchin  (a Democrat) said he supports a border wall in an interview on Fox and Friends.   But a recent super PAC ad that says he doesn’t support the border wall which is important in a state where Trump won by a large margin and where (the reduction of illegal) immigration is viewed as an important issue.  Manchin is trying to remove the ad from TV and has a press release here.

Manchin definitely voted yes for cloture on an amendment to the Broader Options for Americans Act that supported increased border security (including physical structures like walls).  The amendment didn’t move any further in the bill making process.  But to make things complicated Manchin was quoted in a Politico article from last July as saying:  (he’s) “not been supportive of funding for a wall.” and “It’s something I have no interest in. I just think we have so many other pressing problems and I think there are other ways immigration needs to be treated.”.  I am curious why there was a change in Manchin’s position  It’s fine that he changed his mind,  but I think he will need to explain the older comments against the wall and increased border security so that he doesn’t look like he changed his mind to attract voters.

I had some trouble sorting out both sides of the story because there was some spin on both sides.  The Republican PAC was trying to sway Trump voters to not vote for Manchin,  but they didn’t mention the GOP candidate.   And Manchin was trying to point out his support for Trump’s immigration agenda,  but I personally feel like he didn’t explain his previous comments in the Politico article,  and I generally trust Politico as a news source.  Hopefully, some new polls will come out so we can see what the voters think about this.  I can see some more attack ads coming out about this topic,  but it always difficult to guess how voters will react.

Political Bias Disclosure: Since I am a strong believer in transparency in the discussion of politics,  I would like to disclose that I am a moderate Republican.   I try to remain unbiased and look at the issues like a voter in that election.

Race Profiles: West Virginia and Indiana

In today’s blog post I will discuss two of the competitive races.  In West Virginia and Indiana, we have two incumbent Democrats from red states.   In these profiles, I want to examine the candidates and the race,  much like a voter would.    I want to look at past elections, the experience of the candidates,  and the stance of the candidates on the issues (as judged by their campaign websites).

Disclaimer:  I am a moderate Republican,  and while I try to remain as objective as possible I acknowledge that my unintentional bias might affect my view of the races.

West Virginia:

2016 Presidential Election result:  Trump: 68.6%,  Clinton 26.5%, Margin: Trump +42.1%

2012 Senate Election result: Manchin (D) 60.6%,  Raese (R) 36.5%

Democratic Candidate:  Joe Manchin

Manchin was governor of West Virginia before he was elected in 2010 during a special election.  He seems to be a more moderate Democrat than most.  What I think makes him appealing is his focus on improving the quality of life of West Virginias through tax, education, and healthcare reform.

Republican Candidate:  Patrick Morrisey

Morrisey has served as the Attorney General of West Virginia since 2012.  He seems to be pretty conservative and has is presenting himself with views that appear similar to Trump (without invoking his name) which is his main advantage.

My Thoughts on the Race:  The polling is limited right now.  I am inclined to believe the Gravis poll which shows Manchin with a 13 point lead, over the WPAi poll which shows Morrisey with a 2 point lead (inside the margin of error).  I know that the incumbency advantage is strong and that Manchin has done well in his past two elections.  But given the polarized environment,  I think Morrisey has a chance.  Hopefully, more polls will tell a better picture.

Indiana:

2016 Presidential Election result:  Trump: 56.9%,  Clinton 37.8%, Margin: Trump +19.1

2012 Senate Election result: Donnelly (D) 50%,  Mourdock (R) 44.3%

Democratic Candidate:  Joe Donnelly

Donnelly has a US representative before he was elected to the senate in 2012.  Donnelly seems moderate.  His website focuses multiple times on the importance of looking at the best policy and not the one associated with a party.

Republican Candidate: Mike Braun

His website  is reminscent of the Trump campaign with “Drain the Swamp” on the front page at the time of this posting.  He is very conservative and similar to Trump (in my opinion).   He could turn off centrists,  (I am a moderate republican and I don’t know if I would vote for him).

My Thoughts on the Race:  If I had to pick which race between Indiana and West Virginia was more likely to flip I would choose Indiana.  There is only one poll  right now. I don’t consider Braun’s  1 point lead to be that meaningful but it shows that the race is very close.  It’s an early poll of 400 voters with 7% undecided, which limits its predictability of the election result.  Indiana is definitely worth watching.

Senate Races Initial Categorization

For today’s regular post I want to lay out what my initial subgroups to use in the analysis.

Click the map to create your own at 270toWin.com
 Safe Republican Seats – Mississippi (Both seats),  Nebraska, Utah, Wyoming
These are deep red states, and unlike Texas and Tennessee, they currently lack a strong democratic contender.
Republican Seats to Watch – Arizona, Nevada,   Tennessee,  Texas
Arizona and Nevada are vulnerable because the incumbent is retiring, and both states were relatively close in the 2016 election.  Tennessee has a retiring incumbent and a former Democratic governor who is doing good in the polls.  In Texas,  O’Rouke is looking like he may have a chance.
Democratic Seats to Watch (States that Trump Won)-  Florida,  Indiana, Missouri, Montana, North Dakota, Ohio,  Pennsylvania,  Michigan,  West Virginia, Wisconsin
This category is of the states that Trump won,  and I want to watch these states to see if they have the potential of flipping.
Safe Blue States – California, Connecticut, Delaware,   Hawaii,   Maryland, Massachusetts, Minnesota (both seats), New Jersey,   New Mexico, New York,  Rhode Island, Virginia, Washington
Safe Incumbent Independent Senators – Maine, Vermont
I don’t see Sanders or King having any problem being reelected.
I want to profile all the states I have categorized as close.  I am going to go 1-2 a week, and I will wait until after the primary so I can discuss both candidates.  Right now,  I think that the Republicans will remain in control of the Senate,  with at least 50 seats (the VP can break ties).   There are vulnerable Republican seats, but there are also vulnerable Democratic seats.  I do expect individual seats to flip,  but the final result should be about the 51-49 split it is right now.

Can Beto O’Rouke Become the Next Senator of Texas?

I wanted my one of my first posts about the election to be about the Texas Senate Race.

A few weeks ago,  I listened to the FiveThirtyEight podcast on the Texas Senate Race which discussed the Quinnipiac poll (from April 18th) that showed a statistical tie (Cruz’s lead was less than the margin of error) between Cruz and O’Rouke.  I have thought about the race a lot because it has the potential to be unusually close.

The question of the race is this:  Can a Democratic candidate beat an incumbent Republican in a Red state that Trump won by 9 points?   The poll data is pretty weak right now,  with Real Clear Politics  just showing the Quinnipiac poll and one from JMC Analytics (an agency I have never heard of).  But this is something worth watching, and looks like it will be a close race.

O’Rouke has a chance,  but he will have to work on turnout and flipping Republican and Republican-leaning voters like myself, who are moderate, dislike Cruz and the direction of the GOP.

My planned vote is for Ted Cruz, not because I approve of him nor because I agree with most of his politics,  but because Cruz better represents my politics and I think O’Rouke would vote the party line if elected.  The Republican Senate majority is vulnerable,  which is also influencing my decision.  Only time will tell who the other moderate Republicans will vote for,  but Texas will be a race to watch on election night.

 

 

My New Project: Revised Models to Predict American Presidential Elections Preregistration

My current project is a series of new models to predict American Presidential Elections like in the original model with some minor changes.   The new models have 3 different methods to reassign undecided voters,  2 different conjugate priors, and 3 different ways to calculate using the Gaussian conjugate prior.   The models deal with hypothetical election results with only the two major parties’ candidates. In total there are 12 models.  This is a pre-registration post with my methodology and some thoughts on what I think will happen and what I am looking for in the results.

One of the key features of this project is while it still takes a similar approach of using poll data from other states as the prior, it expands the prior to be a pooled collection of all the polls from within the category.   I believe that this new method will help address some of the issues I faced choosing one source of polls as the prior, and will possibly help in swing states where it will use polls from other swing states.

One of the goals of this project is to have better more definitions of swing states and prior regions.  The original model had definitions that were admittedly somewhat ad-hoc.   In this new project, I define a swing state as a state that has been won by both a Democratic candidate and a Republication candidate in the past four elections.   Overall, I like this definition because it is easy to use, but I wish it could capture future swing states like Indiana in 2008, and Michigan, Pennsylvania, and Wisconsin in 2016.   Since I don’t have the same time constraints I had with the 2016 model,  I have been able to put more thought into how prior regions should be defined.   This time I am going to stick closer to the US Census regions (found here) and divide the West and Midwest Census regions into a red state and a blue state subgroup.   I am going to split the Southern and Northeastern regions into two subgroups of the region with the same partisan alignment.   I am going to more Delaware, Maryland, and Washington DC into the Middle Pacific subregion of the Northeastern region since I think that is too small of a region and Washington DC and Delaware usually only have a handful of polls.  I think these states would benefit from being joined with the Middle Pacific region and will help even out the between state demographic variation.   Since the Census regions are more based on geography than culture and politics according to the history of the Census regions found here, I feel comfortable doing this.   I am also changing my mind from the previous model on the placement of Missouri.  The fact that the race was so close in Missouri in 2008,  indicates to me that its political culture may be more like the Midwest than the South.  To me, a key feature of the Midwest (and the smaller Western states) is that state partisanship is weaker than other states, and swings are more common compared to the Northeastern region or the South.  I am going to keep Missouri in the Midwest region, where it is in the US Census regions.  I am splitting the Northwest into the Middle Atlantic and New England subgroups.  In the South, I am going to split it into two regions:  one containing the West South Central region plus Tennessee and Kentucky, and another with the South Atlantic region plus Mississippi and Alabama.   Dividing the south was a difficult decision, but I looked at the Electorate Profiles and decided that that was the best way to preserve demographic similarly among key groups (Whites , Hispanics, African Americans, college-educated individuals, high-income earners percentage, and percentage in poverty) into the Southern regions.  Deciding the group for the Southern blue states was hard because they were too small of a group to be alone, and while the Middle Atlantic region wasn’t a great fit it was the best fit.

The models use three different methods to reassign undecided and minor party voters.  The first method reassigns the voters based on the past election results.  The second method splits the undecided voters equally between the two candidates.  Lastly, the third method reassigns the undecided voters proportional to their support.  For example consider a poll of a hundred people with 50 supporters of the democrat, 40 supporters of the republican, and 10 undecided voters.   The state voted 60% of the democrat and 40% for the republican in the last election.  Under the first method, 4 of the undecided voters would be reassigned to the Republican candidate, and the other voters would be reassigned to the Democratic candidate, making the poll results 56 for Democrats and 44 for Republicans.  The second method would reassign 5 voters to the democrat and 5 voters to the republican making the adjust pool results 55 Democrats and 45 Republicans.  Under the third method, the Democratic candidate received 55.556% of the two-party support, and the Republican received 44.444% of the  two-party support,  this translates to a fraction of a person so the multiplied figures of 5.556 and 4.444 are rounded to 6 and 4 respectively.   I realize I could drop the undecided voters from the polls as done in this paper by Lock & Gelman, but I am using poll data to predict the election result and not using a time series approach.   I haven’t found anyone using past election results to reassign voters.   FiveThirtyEight splits the undecideds evenly between the two candidates, so that is why I included that method.  This paper by  Christensen & Florence talks about the proportional reassignment of undecided voters.   The Christensen & Florence paper talks about an undergraduate project on predicting elections and has been a heavy inspiration for my research.

Conjugate Prior and Calculation Methods

These models use either the binomial or Gaussian conjugate prior.   The goal of the models is to predict the proportion of votes for the Democratic candidate among the two major party candidates.  The data is binomial with a Bernoulli likelihood, but the extent of the independence of people concerns me.   I think individuals show up multiple times in the polls, meaning that the observations are not independent.  If the data was truly i.i.d,  I would be ok with using the beta conjugate prior,  but since it is likely not the case I am afraid this causes on an underestimation of the variance.     I am curious what effects using the normal approximation to the binomial distribution have in the contexts of predicting elections based on polls.  I also want to see the effects of different methods of reassigning voters and the new prior has on the original calculation method from the previous study.  In the original study, I used the standard deviation and count of polls inside the Gaussian conjugate prior.   There are 4 different models:  Beta conjugate prior, a Gaussian model that uses the normal approximation to the binomial distribution and updates after every poll,  a Gaussian model that averages the polls and finds the standard deviation of the poll data and uses that information to make the calculation, a Gaussian model that turns the polls into one giant poll and uses the normal approximation to the binomial distribution.  If I had to choose the better assumption,  I would go with polls are independent over people are independent.   But I plan on eventually exploring ways to remove that the independence assumption.

Choosing the “Best” Model

I don’t think I am going to take all twelve models and turn them into multilevel models or run simulations.  Based on the data I have every model is run 153 times (3 times for the 50 states plus DC) to predict the 2008, 2012, 2016 elections.   The pooled models would likely not translate well into a time series model.  The main question I am asking is: do these changes make the model even more accurate, or at least as accurate as the original model?   I also want to know if the method used to reassign undecided voters matters.   I don’t  think it will since the proportion of undecided voters are is small and the difference between the polls and past vote usually similar.  I don’t like the idea of splitting the vote evenly between the two candidates because I think it doesn’t work as well in highly partisan states.  I don’t think that undecided voters at any point in time in West Virginia or Massachusetts are going to vote are going to turn out and vote equally for the two major candidates.   What I am hoping to get out of this is a rough idea is if any of these changes have a practical effect on accuracy.   And if there is no difference I am going to probably opt for proportionally reassigning voters and iteratively updating the model.

Looking Forward to Further Research

This project is an intermediate step in the process of testing the use of poll data from other areas as a part of the prior in a Bayesian model to predict American national elections.   Since there are a lot of key changes in this new set of models,  I want to get more data on the accuracy of my idea of exclusively poll-based models to predict elections.   What I hope later is to turn this into a time series multilevel model with and without the inclusion of a fundamental model.   I don’t have anything against fundamental modeling, but an exclusively poll based model requires less data collection than fundamental modeling.  I want to see the viability of this method,  because if it can match the performance of fundamental models then this may be a better strategy.  I want to make my own fundamental model that treats swing states differently from partisan states in the future.   I intend to look at state-level and regional-level effects on voting behavior.   The big assumption of this method is that state-level effects within a region are small and that pooling the polls across a region mitigates this effect so that the pooled polls are a good preliminary estimate of voting behavior.

 

Correction Notice for Results

I was rechecking my error calculations after I received a comment about the error calculations from a reviewer of my paper.   An example calculation was incorrect.  This was a minor error, but a further examination led to the discovery of a error in the 2-Party error of my model in 2012.  The error was approximately half of what it was supposed to be.  This mistake made my model falsely appear more accurate than the Five Thirty Model do to this underestimation.   All of the error calculations are currently being reexamined for possible errors.  I have already recalculated all the errors but I want to check them a couple of more times to be safe.  A corrected table will be posted once it is checked again.

Update: 12/5

No other major errors were found in the re-checking process.  All calculations have been checked three times post the discovery of the error of the 2012 2-Party error for my model.

Update: 1/20  fixed typo in tested model 2008 for both all candidates and 2-party and adjusted average

Below is the updated table to replace the former tables used in both the ESR Virtual Poster and the USPROC Paper:

 

Tested Model RMSE Tested Model RMSE Swing States RCP RMSE Swing State 538 RMSE 538 RMSE Swing State
2008 All Candidates 3.5474 3.14788 4.23389 3.19332 1.66958
2008 -2 Party 2.89669 2.57051 3.63513 3.0305 1.47846
2012 All Candidates 3.25139 1.94492 2.33511 2.38019 1.2979
2012 2-Party 2.37053 1.17163 1.61076 1.98642 0.9342
2016 All Candidates 6.82013 3.95985 3.32952 5.37952 3.56511
2016 2-Party 3.95985 3.14325 2.04295 3.81296 2.31948
All Candidate  Average 4.53964 3.01755 3.299507 3.65101 2.17753
2-Party Average 3.07569 2.29513 2.42961 2.94329 1.57738
2-Party Average Compared to 538 0.95695 0.68727 0.64923
2-Party Compared to RCP 1.05859

 

Column1 Tested Model RMSE Tested Model RMSE SS RCP RMSE SS FiveThirtyEight Polls PlusRMSE 538 RMSE SS
2008 3.5474 3.14788 4.23389 3.19332 1.66958
2008 -2 Party 2.89669 2.57051 3.63513 3.0305 1.47846
2012 3.25139 1.94492 2.33511 2.38019 1.2979
2012 2-Party 2.37053 1.17163 1.61076 1.98642 0.9342
2016 6.82013 6.42335 8.23311 5.37952 4.14228
2016 2-Party 3.95985 3.03986 1.89412 3.81296 2.41263
2016 SS without UT and AZ 3.99534 3.32952 3.56511
2016 SS without UT and AZ 2-Party 3.14325 2.04295 2.31948
Overall Average 4.53964 3.83872 4.93404 3.65101 2.36992
2-Party Average 3.07569 2.26067 2.38 2.94329 1.60843
2 – Party Average Compared to 538 0.95695 0.71148 0.67581
2- Party Compared to RCP 1.05279

 

 

What my Undergraduate Research experience was like in Statistics

I am entering my third and final year of my undergraduate degree.  I have been doing research since almost day 1, and I wanted to share what my experience was like. As a statistician, I feel like I have to mention this is from a sample size of 1 and may not reflect all undergraduate research experiences.

First, I want to give a little background.  The summer before my senior year of high school, I was chosen to participate in an NSF (National Science Foundation) funded REU (Research Experience for Undergraduates)  at Texas Tech.  There I was exposed to what research was like.  We had a series of workshops each led by different researchers over a two week period. I loved the Texas Tech math department and decided to attend Texas Tech for my undergraduate degree. I meet my current research advisor Dr. Ellingson at the REU.

Right after classes started during my freshman year, I decided to email Dr. Ellingson and see if could do research with him.  I started work on image analysis (Dr. Ellingson’s specialty).  I was also following the GOP nomination because it was interesting to me.  I had an idea to predict the nomination using Bayesian statistics, similar to how Five Thirty Eight predicts elections.  I had talked with Dr. Ellingson about political science statistics before and how there was a need for a statistically sound open source academic model.  He agreed to help guide me through the process of building a model to predict the GOP nomination process.

At the time of the GOP nomination my math background was pretty limited, so I decided to just use Baye’s theorem and used the normal distribution to estimate likelihood.  I did all the calculations in excel and I downloaded csv files from Huffington Post Pollster with the poll data.  I used previous voting results from similar states as the prior in my model.  More info about my model can be found here. What I found the most challenging was making a lot decisions about how I was going to predict the election.  I also struggled with making the decisions about the delegate assignments which often involved breaking the results down by congressional districts, even when the poll data was state wide.  After the first Super Tuesday (March 1st) I began to realize that how difficult it is to find a good prior state and reassign support of candidates who dropped out of the race.  The nomination process taught me that failure is inevitable in research, especially in statistics, where everything is at least slightly uncertain.

In the summer of 2016, I started gearing up for the general election. I decided to use Scipy (a python package for science and stats) to make my predictions.  Making the programs was incredibly difficult.  I had over a dozen variations to match different combinations of poll data.  I had the programs up and running by early October, but I discovered a couple of bugs that invalidated my early test predictions.  The original plan was to run the model on the swing states two or three times before the real election. In the middle of October I discovered a bug in one of my programs.  I had to then fix the bug in every program.  I then finally did some manual calculations to confirm the programs worked.  It was difficult to have to admit that my early predictions were totally off, but I am glad I found it before the election.  Research isn’t like a homework assignment with answers in a solution manual.  You don’t know what is exactly going to happen and it is easy to make mistakes.

I ended up writing a paper on my 2016 general election model.  Writing an paper on your research is very different than writing a paper on other peoples research.  My paper was 14 pages (and over 6500 words) long, and only about one or two pages were about what other people’s research on the topic.  It took a very long time to write, and I had 17 drafts.  I hated writing the paper at first, but when I finished it felt amazing. It was definitely worth the effort.

Undergraduate research is difficult, but I loved the entire process.  I got to work with real data to solve a real problem.  I learned how to read a research paper, and eventually I got to write my own.  I got to give presentations to both general audiences and mathematicians and statisticians.  I got to use my research to  inform others about statistics. If you are thinking about doing undergraduate research, you definitely should.