Charlie Ballard

Co-Founder & CEO

Probability Is Never Certainty: Confidence in the 2020 US Presidential Election Polling

October 30, 2020

While the hotels industry has had an unprecedented level of discussion around room rates and how to best set pricing during the ongoing pandemic, there have also been significant factors outside of COVID-19 which have received little to no attention whatsoever.

A good friend just asked a question we’ve all been hearing a lot of lately. Maybe too much.

“So who do you think is going to win the US election?”

I’ve become rather fascinated with the US election polling, the methodology behind it, what it does and doesn’t say, and maybe most importantly the answer to this question: how well do polls predict what’s actually going to happen?

And one of the most common follow up statements you will see everywhere — in friendly conversations, in hot online debates, in academic articles — is “How can I trust the polling when they got it so wrong in 2016?

FiveThirtyEight’s chances as of November 8, 2016: high odds but far from certain.
FiveThirtyEight’s chances as of November 8, 2016: high odds but far from certain.

Now, there are a lot of reasons to think that pollsters “got it wrong in 2016”. The biggest reason being that on the day of the election the polling (specifically Nate Silver’s site FiveThirtyEight.com) gave Hillary Clinton odds of winning a whopping 71.4%.

And then she lost.

FiveThirtyEight’s final estimated probability of the candidates’ chances of winning is still live here for all to see, screenshot to the right.

Looking at the numbers you can probably understand why so many people cried out that they “got it wrong”. When people see a number like 71.4%, they may think “that’s almost a sure thing!”.

But it was never a sure thing.

FiveThirtyEight was only ever showing the odds, no different than when you bet on a horse race.

The other day I saw an online advertisement for English horse betting, and ended up wondering:

Why is it that when people bet at the racetrack and the horse with the best odds doesn’t come in first place, they generally don’t angrily claim “the betting offices got it wrong and can never be trusted again”?

A few other factors that somewhat excuse the 2016 pollsters:

  • The national polls were pretty accurate. As most readers will know, the United States has a peculiar presidential election process, where the total number of votes across the US, commonly referred to as “the popular vote”, doesn’t decide who becomes president. The popular vote is interesting but ultimately irrelevant. When you see US polling conducted at the national level, they’re therefore only predicting the popular vote, and these national polls were almost dead-on in 2016. Hillary Clinton did win the popular vote by 2.87 million votes - almost exactly what the national polls forecasted.
  • And the national polls were irrelevant. To win the presidency, a candidate must win the “electoral college”, a complicated process which essentially comes down to them needing to win the majority of votes in a large number of US states. Every state where a candidate wins more than 50% of the vote earns them a fixed number of “electoral votes”, very roughly determined by each state’s population. For example, winning a low-population state like Montana wins them only 3 electoral votes, while winning the highly populated state of California wins them a whopping 55 electoral votes. There are a total of 538 of these electoral votes to win, and the candidate that wins 270 electoral votes wins the presidential election.
  • “Swing states” were therefore all that mattered: While candidates would love to win every vote in the country, because of the electoral college process they tend to ignore states that are almost certain to vote for one candidate no matter what happens. For example, most candidates tend to ignore both Wyoming and California during their campaigns, as voters in Wyoming tend to go heavily for Republican candidates, and voters in California tend to go heavily for the Democrat. Because such states are almost impossible to “flip” to the other way, they get left out, and therefore most of the campaigning focuses on a small number of about 12 or so “swing states”, sometimes called “battleground states” for dramatic effect, which might go either way.
  • The 2016 swing state polling was very limited: State-level polls across these important states were relatively rare in 2016, and later in the year, from Sept - Nov 2016, many polling agencies ran out of money and had no new polls in important states that Donald Trump claimed with surprise wins, including Michigan, Wisconsin, Pennsylvania and Florida. No polling in the swing states led to no awareness that Trump’s campaign was gaining ground there. Also adding to the problem: the state-level polling wasn’t sampling enough voters without college degrees, something that likely skewed conclusions in the results.

So why should anyone think the polling in 2020 will be any different? Most pollsters have made a few key changes this time around:

  • In 2020 there are many more polls in the key swing states (see views below)
  • The 2020 polls are putting more effort into sampling non-college-educated voters

So do these changes mean the 2020 polls are flawless? Of course not. They almost certainly will not provide a perfectly accurate picture of how the election will turn out. However all that said, they are most likely more accurate than 2020, and also show a higher probability of the challenger candidate winning than Hillary Clinton had in 2016.

One last very important factor that may throw all of this out the window?

Turnout.

Voter turnout in 2016 was actually relatively low, but in 2020 to date voter turnout looks like it may significantly pass the all-time record set in 2008 with Obama-McCain. If turnout in this election is particularly high it may break the rules in ways that the pollsters’ models just aren’t prepared to deal with.

Google Data Studio Is Brilliant.

Being fascinated with this polling, we found ourselves frequently logging into Nate Silver’s website FiveThirtyEight.com every few days to see what the latest polling trends were saying about each swing state.

However the process was cumbersome, because while FiveThirtyEight.com has some terrific data, it’s not always summarized in the most immediately accessible way. Just show us the overall probability of each candidate winning. Show us change over time.

Just show us what matters: swing states trends.

Google Data Studio made it shockingly easy to simplify & automate the data transport process, then produce and share the dashboards WE wanted to see. We were able to quickly create a few live “Data Sources” to automatically ingest the latest FiveThiryEight polling data each day, then build out clean dashboards to make the tracking much cleaner and simpler.

This first view below tells us the probability of the Challenger candidate (here Joseph R. Biden) winning both the irrelevant Popular Vote as well as the all-important Electoral College vote.

Should Democrats feel comfortable with an 87%  probability of winning? Someone yesterday put it well:

“How comfortable would you feel if you were going bungee jumping and discovered that out of the bungee company’s last 100 customers, only 13 had died?”

The views here are updated multiple times a day, every day. Just refresh this page anytime to see the latest polling data.

I find this next view the most useful out of those we created, as it just shows the probability of each state going for the Challenger candidate based on a weighted average of the latest polls — but for only the important swing states, and only for polls with the highest rankings.

This next view is also pretty handy, as it clearly shows the up-to-the-minute most recent polling results for specific swing states. As new polls come in and are finalized in the FiveThirtyEight feed, Data Studio adds them to the display below. It’s not quite real time, but it’s not too far off.

Group 1 shows four of the swing states currently leaning most toward the Challenger candidate. Michigan was excluded as it’s leaning even harder than these four at the moment.As shown below, Group 2 shows four of the swing states somewhat on the fence. While New Hampshire is currently tilting rather strongly toward the Challenger candidate, Florida, Arizona and North Carolina all show conflicting polls and/or polls close to a tie, and will likely be decided by whichever candidates end up drawing the strongest turnouts.

Next, as shown here, Group 3 shows the four swing states currently leaning most toward the Incumbent candidate. Ohio, Georgia, and Iowa all show close to 50-50 polling, while Texas is currently leaning toward the Incumbent candidate yet the Challenger is well within the margin of error.

And finally, Data Studio allows you to choose whichever state or states you’d like to see the data for, by making interactive drop-down menus quick and easy. With live, frequently-updated views like these in 2016 journalists and the general public might have been more aware that state-level polling was weak and unreliable and therefore reset expectations accordingly.

If there’s one thing to keep in mind above else, it’s this:

While polling and other survey data is a useful way to see the current opinion held by a large group of people, what will make all the difference at the end of the day are the actions people actually take.  With record turnout, any outcome is possible.

Please do let us know if you have any questions or if there are other views of the data you’d like to see. We’re always happy to lend a hand.

Contact Us

Get in touch on charlie@relevancyanalytics.com. We'll respond within 48 hours.