Three States, One Close Election: Part 2

The election results were surprising to many. We recently wrote a blog post that considered the sample size needed to predict the outcome. In this blog, we will dive deeper into the predictions.

When pollsters survey voters, they are sampling to find the true probability that a voter will select a candidate. In its simplest form, it is like predicting a dice roll or a coin flip. A smaller number of trials are used as a sample. The sample is then used to project an upper and lower limit of what the actual probability is. If you flipped a coin four times, it might not land on each side two times. However, through a statistical technique known as a binomial hypothesis test, you could detect the likelihood your result would have happened if the true probability is 50/50.

Below, we use this technique on the votes cast for Trump or Clinton in Michigan. This will let us know what the upper and lower ranges were of election predictions, given the results.

Screen Shot 2017-11-30 at 3.22.38 PM.png

Using data available from www.fivethirtyeight.com, the polls showed an average Clinton lead of 4.5 points. Now that the election is over, we know that the polls were off, but this is what the predictors were looking at.

It’s a common technique to apply a binomial hypothesis test to survey data. To do this, we take  people who said they would vote for Trump and mark those as successful trials. We hypothesize that the true probability is 50% because there are two candidates. From there, applying this statistical test allows us to determine the likelihood of the survey results occurring as they did, or more extreme if the true probability was 50%.

For example, let’s take look at one of the surveys from the fivethirtyeight dataset Survey USA. It shows 343 Trump supporters out of 787 surveyed. Utilizing a tool called RStudio, we will run a binomial test. 

static1.squarespace-2.png

With 43% of people sampled voting for Trump, and sample size of 787, a binomial test estimates the true probability of the population voting for Trump was between 40% and 47%. The likelihood that the real probability is equal to 50-50, is outside of the standard 95% confidence interval. This means we would reject our hypothesis that the chances of Trump winning were 50-50 in favor of Clinton. From this sample, we would say that we are 95% confident the true percentage of Trump voters is between 40% and 47%.

It’s important to note with any sampling method we are coming up with a range and not a single number. We are trying to figure out how many people are going to vote for a candidate by testing a smaller group that ideally reflect the entire population. The more people that we ask, the more confident we should be in our findings. An assumption that is made with sampling is that you are talking to a representative or random group.

After computing this range for all of the surveys, there are a wide ranges of intervals, but it is apparent how the range of the upper and lower estimates of the true probability tighten with the number sampled. It shows that there are actually five surveys that showed a potential Trump victory in the range, but the possible Trump victories predicted tend to be in the smaller surveys.  

With the data provided, it’s reasonable to predict a Clinton win in Michigan. After all, there was only one survey that predicted Trump would win and only four others that thought it was within a 95% confidence interval. For some reason, the survey data did not reflect what was actually going to happen. The samples could have been biased. The people who answer surveys may be less likely to vote. There could have been changes of heart after the survey was taken. It is difficult to know.

This is an example of why we need to be careful when using samples. We must assume the sample is representative of the population, but in some cases, it isn’t. As much as the data can give us confidence moving forward we need to keep our skepticism. People do not behave like dice or coins when they enter the ballot box or the market. They are complicated. A sample is helpful in guiding decisions, but does not guarantee a right answer.