Last week, the 2016 presidential election shocked the nation. One-half of the nation woke up to nightmare; the other woke up to their salvation. Donald Trump won in the Electoral College despite his defeat in the popular vote count. Prior to Election Day, website fivethirtyeight.com had the chances of Clinton winning the popular vote but losing in the Electoral College at 10.5%. How did something that had an 89.5% chance of not happening, happen? Pundits, party wonks, and pollsters will be dissecting the results until the next election cycle. We've got a few thoughts too - at least on how the polls could have been so wrong in three states that were crucial in turning the election over to Donald Trump.
Polls, Margins of Error, and Sample Size
In this post, we'll take a closer look at the polls in three states where the margin of victory for Trump was very small. To predict the winner in these states, the polls would have needed much larger sample sizes. How much larger? Let’s find out.
The chart below show two candidates; candidate A is polling at 42% and Candidate B is polling at 58%.
Polls provide estimates of election outcomes. They typically assume a confidence level of 95% and a margin of error around plus or minus 5%. In our example above, the poll is 95% confident that candidate A has 42% of the likely voters, plus or minus 5%. Candidate B has 58%, plus or minus 5%. The error bands don’t overlap, so we can say with confidence that candidate B is likely to win.
The required sample size is a calculation based on confidence level and the margin of error. To narrow the margin of error or increase the confidence level, sample size needs to increase.
The Tale of Three States
Let's look at three states that helped turn the election: Florida, Pennsylvania and Wisconsin. Under normal circumstances, the sample size required in each state would have been about 384. The table below shows the average sample size of polls from each state used by the fivethirtyeight.com models. We can see that the sample sizes are significantly higher than required. Thanks to fivethirtyeight for posting the data here.
Small Margin of Victory, Small Margin of Error
Trump's margins of victory in these three states were very small, as shown below. The % Difference column shows the difference between Trump and Clinton as a percentage of total votes cast.
Now, let’s look at our polling chart in Florida, keeping the +/- 5% margin of error.
The error bands almost completely overlap, giving us no clear winner.
Traditional polling approaches break down when the race is too close. As mentioned above, in order to shrink a poll’s margin of error, the sample size needs to increase. Look at the required sample sizes in the table below and compare it to the actual average sample sizes for each states' polls.
In practice, pollsters need to balance the costs associated with sample size against reducing the margin of error. In addition, they need to control various forms of bias within the sample, so getting a representative sample that is large enough might be impossible in some states. Paradoxically, technology has made it more difficult to collect solid polling data in the last few years. Here is a good discussion of the issues pollsters face today. All this means that there are plenty of surprises coming in future elections, and that we can’t rely on the polls.
Code and data for this analysis are here.
For quick reference, check out this sample size calculator.