Statistical slip ups: The problems with political polling

If Americans learned anything from the 2016 presidential election between Donald Trump and Hillary Clinton, regardless of party affiliation, it was that polls cannot be trusted. Pollsters wildly missed the mark in 2016, with the New York Times giving Clinton an 85 percent chance of winning, FiveThirtyEight predicting a 71.4 percent likelihood, and Reuters gambling 90 percent in favor of Clinton. News sites all but inaugurated her before voting even began, but she did not win the presidency.

In this year’s presidential election, FiveThirtyEight gave Joe Biden an 89 percent chance of winning. As I repeatedly refreshed the news on my phone on election night, this number did not provide much clarity for what the result would be. How can statistics that sound so certain sometimes get it so wrong? In these instances, it is not always that the statistics are incorrect, but rather, there are limitations to polls and statistical methods that must be factored in along with the final percentage. 

In analyzing the reliability of a political poll, one must first take into account who is conducting the poll. With the internet allowing almost anyone to survey large samples, it is important to identify trustworthy sources. The most trustworthy sources are organizations with factual reporting and little-to-no political leaning. Untrustworthy sources can manipulate methodology to best suit their desired outcome by asking leading questions and polling from unrepresentative samples. Although it is impossible to completely eliminate these inaccuracies, reliable sources will help mitigate their effect on data and allow for meaningful analysis that takes them into account. 

Untrustworthy sources can manipulate methodology to best suit their desired outcome by asking leading questions and polling from unrepresentative samples.

The most common methodologies of polling today are online and telephone surveys, employed by sources like CNN, Fox News, Politico, the Associated Press, and the Pew Research Center. However, since these sampling formats require participants to actively engage, biases are introduced that create an unrepresentative sample. The Pew Research Center has shown voluntary samples “tend to overrepresent adults who self-identify as Democrats, live alone, do not have children and have lower incomes.” To correct for underrepresentation in certain demographics, sources can weigh different variables. This is where untrustworthy sources have leeway to mislead the public and where reliable sources attempt to improve statistics.

Since 2016, sources like Gallup, the New York Times, and the Pew Research Center have increased the number of variables that they weigh to better represent the national population. They suspect that a significant contributor to incorrect predictions in the 2016 presidential election stemmed from demographics that were not accounted for, and they now include upwards of 10 variables in their poll assessments rather than three or four. 

Since 2016, sources like Gallup, the New York Times, and the Pew Research Center have increased the number of variables that they weigh to better represent the national population.

This may sound like an easy fix, and incorporating new variables will certainly improve polling in the future. However, the last two elections brought additional factors that could have an influence on the reliability of polls. In the 2016 election, the general disapproval of both candidates is thought to have led to many voters not deciding who to cast their vote for until very close to Election Day, if they voted at all. In this year’s election, Americans voted during a pandemic, which had a significant impact on voters’ plans for Election Day. Although polls aim to be nationally representative, factors like these could alter the outcome of the election regardless of what a national poll suggests.

The 2016 election also emphasized the difference between the popular vote and Electoral College outcomes, with Clinton winning the popular vote by 2.9 million votes but Trump surpassing her electoral votes by 74 electors. The electoral college is a system in place to help equalize state interests when there are large state population disparities, which changes the effect of a vote on the election outcome depending on where the voter lives. Statistics that claim to represent the likelihood of a candidate to win do not take the electoral system into account, only representing the popular vote outcome. In this sense, pollsters’ claims in 2016 were not wrong — more Americans were in favor of Clinton even though Trump won the election. 

Political polling relies on both the pollster and general public to be successful. Pollsters should continue to account for undersampled groups, reduce biases, and be transparent in their methodologies while the public should be active in their role of finding trustworthy sources and understanding the limitations of statistics. Political polls should not be regarded as the end-all, be-all, but with a healthy balance of knowledge and skepticism, polls can be trusted and used as a tool in gauging public interests.

Image source: Pixabay.