Overview

Now that the 2020 election is officially over and Biden was elected as the President of the United States, it is important that I reflect on my prediction model. I am excited to see how I could learn from my model for future models that I create.

Model Recap and Predictions

Let’s first recap on my prediction model to get a better picture of what it was.

Patterns and Accuracy

Overall, I am pretty satisfied with how my model turned out. While I did miss a few states and this is my first election forecast, I was quite happy that I predicted some battleground states correctly.

Above is a comparison between my predictions and the actual results of the 2020 election. As you can see, the states that I got wrong were battleground states. However, I would like to say that the predictive intervals for the battleground states did capture the true result.

Moreover, let’s take a look into the plot above, which plots the actual two-party vote share for Trump against my predictions for Trump. The blue points represent states Biden won and the red points represent states Trump won.

Furthermore, the map above shows the difference between Trump’s actual and predicted two party vote share in each state. A negative difference means that Trump was overpredicted for that particular state while a positive difference means that Trump was underpredicted for that particular state.

Hypotheses for why my model was inaccurate

Now that we have went over my prediction model, it is important to look at possible hypotheses for the inaccuracies seen in my model. My model seemed to incorrectly predict the results for battleground states in particular and it is important we pay attention to the reasons why. Below are my hypotheses for explaining the overall inaccuracies of my model:

  1. One hypothesis to explain the inaccuracy of my model was that it failed to take into account the recent voting trends in particular states. For example, Georgia and Texas have been trending blue recently but my model failed to take note of this. This could be because my model relied more heavily on historical popular vote share and polling and so since Georgia and Texas were traditionally red states, my model would predict the same for 2020.
    • Moreover, my model failed to consider the recent voting trends in populous counties. For example, Miami-Dade County became much more red in 2020 than in 2016 and heavily helped Trump win Florida again. Likewise, Fulton County in Georgia heavily favored Joe Biden in 2020, which played a significant role in turning Georgia blue. Thus, it is important that prediction models take into consideration trends not just in states but in counties as well since some counties alone can significantly impact the overall result for the state it is in.
  2. While my model took into consideration the expected increase in the overall turnout rate for the 2020 election, my model failed to take into consideration the change in turnout rates for different demographics.
    • For example, Stacy Abrams played a crucial role in black voter-turnout in Georgia in favor for Biden. The same goes for the large Latinx turnout in Arizona and Nevada, which also helped Biden. However, there were also many Latinos that voted for Trump particularly in South Texas and Florida.
    • Given the large turnout rates for some of these groups, they can thus play a significant role in forecasting the election.
  3. Another hypothesis is that my model relied heavily on inaccurate polls. Some polls in 2020 were fairly inaccurate because they were non-representative of voters and there was non-response bias, particularly from conservatives.
    • On average, polls were off by 2.5 points in battleground states and blue states.
    • Given the inaccuracy of polls, this may explain why my model had inaccuracies, especially since I weighted the poll model by 0.96 in my ensemble model.
  4. Another hypothesis is that using the state Q2 GDP growth rate as a fundamental variable may have hurt Trump more than it was supposed to, especially in battleground states and traditionally red states. This is because economic predictors were very noisy this year due to a recession caused by Trump’s handling of the Covid pandemic.
    • Since the 2020 economy was an anomaly, it probably would be best to not use economic predictors in my model.
    • I would also mention that my model used 2020 Q2 GDP growth rate, which may not be reflective of the current economy as the election was taken place during Q3, and GDP growth rates are drastically different in Q3 from Q2 in 2020.

Proposed tests to test hypotheses

  1. To test the first hypothesis that states and counties have partisan shifts (which can impact the accuracy of my model), I can look at recent voting trends in such states and counties.
    • These states are likely battleground states and the counties are likely in battleground states. Moreover, we can look at how states and counties voted in the 2016 presidential election, the 2018 midterm election, and the 2020 election.
    • We can thus analyze any trends using regressions and correlations and if we see any trends where certain states and counties are shifting towards blue or red, that is something to take note of.
    • One example of a trend that we may see is how southern Texas counties have been voting towards more red overtime in comparison to the 2008 election.
  2. To test the second hypothesis, we can run a linear regression between the popular vote share for a presidential candidate (say the incumbent) and the change in voter turnout for different demographics.
    • Through this regression, we may have a better idea of not only how changes in turnout rates from different demographics may impact election forecasting but also how they may affect democratic or republican popular vote share. * Furthermore, it may also make sense to run the regression on a per county basis since it was evidenced from 2020 that certain counties see greater turnout rates from particular demographics than other counties.
  3. To test the third hypothesis, one test that can be used is create a predictive linear model for the popular vote share for a candidate only using recent polls.
    • Given that historical polls may not be as predictive for today’s elections, it may make sense to only use recent polls like from 2016 onward. This might be because there was never really a president that had the character of Trump and so there may be non-response bias among republicans as some republicans may be afraid to alert pollsters that they will vote for Trump.
    • Moreover, it would also make sense to use polls that are high in quality, which can be measured using FiveThirtyEight poll grades. My prediction model did not filter out for high quality polls, so the quality of polls may impact the results from election forecasting as they may be more representative of society.
    • I would also mention that polls need to do a better job in reaching out to hard to reach demographics like Hispanic Americans, and so I would be interested to use more polls that target these demographics for predictive models.
  4. To test the fourth hypothesis, we can use my prediction model but not include economic predictors as part of the fundamental.
    • As mentioned before, the 2020 economy was an anomaly, so it is best to not use economic predictors.
    • Moreover, it would be interesting to use economic predictors to predict the 2024 election and other future elections given that the economic predictors during those elections are not all over the place. If we do use economic predictors to predict those future elections, it makes sense to leave out economic variables from 2020 then.

Changes to my model

Now that we have a better grasp of understanding my model and where it went wrong, the following are changes I would like to do to my model:

Conclusion

I really enjoyed making my prediction model and learning from it. I now have a better grasp in making prediction models and I am eager to see how future elections will differ from the 2020 election and use my skills in the future. I want to thank my teaching fellow for Gov 1347, Sun Young Park, as well as Professor Ryan D. Enos and Soubhik Barari, for their teaching and help throughout the semester.