One Election Visualization to Rule Them All

I’m fascinated with creating a single visualization that can fully communicate the results of an election. I’ve made a few attempts in the past and even came up with a set of criteria for the ideal visualization. More or less, I want to answer the questions “What happened?”, “What changed?”, and “Why?” in a single visualization. Of course, nobody can fully answer the “Why?” question, but you can point towards an answer by looking at demographic and survey data.

So here’s my most recent attempt at an all-encompassing visualization for the 2016 election:

This might look like a little bit like an abstract painting, but I promise it’s informative. Each color category represents the entire electorate broken up into subgroups by age, housing density, education, gender, race, and state. The size of the bubble represents the population of the group, and the coordinate position represents the margin and turnout. The tail of each bubble represents the change from the weighted average (60/30/10) of the 2012, 2008, and 2004 elections.

If you want to find out what changed this election, just find the large bubbles with the long tails. And to find the crucial states in the electoral college, find the bubbles that just crossed over the y-axis. Note that this visualization is interactive so you can zoom in and mouseover for labels.

So the older age groups, males, high school graduates, and very low density areas all shifted rightwards in 2016. At the same time, turnout dropped substantially among the Non-Hispanic Black populations. These changes were enough to bring crucial states like WI, PA and MI just across the threshold into Republican territory. No new insights here, but I think it’s interesting to have a single graphic that communicates it all.

All the code and data for this post are available here.

Data Issues

The real challenge with these plots is getting the data, especially the demographic data. The source for the turnout data in this plot is the Census Current Population Survey [4]. The margin data for each demographic is from the American National Election Studies cumulative data file [5]. Each of these sources is released in the spring of each year, so you need to wait for a few months to get a clear picture about what happened in an election. Catalist have announced a dataset that will be available more immediately after an election, so maybe I’ll use that in the future (although it’s still not possible to calculate turnout by subgroup using their data alone).

The eagle-eyed among you may have noticed a mathematical impossibility on this plot. The housing density subgroups all have a lower turnout than the sex subgroups. This isn’t possible, and it’s because the numbers are derived from two different sources of data. The housing density data is just from county level Citizen Voting Age Population data, while the sex data is from the Census Bureau’s Current Population Survey. If I calculate the expected national turnout based each group’s values, these are the results:

category estimated_national_turnout
age 0.617928
density 0.547086
education 0.644728
race 0.621094
sex 0.614256
state 0.554520

So obviously there’s some variation there, maybe we should just blame it on measurement error? Overall turnout was 55.4% of voting age population and 60.2% voting eligible population in 2016. So these numbers are in the right ballpark, but obviously they’re not perfect.

A Few More Election Visualizations to Rule Them All

I finally got around to parsing the Catalist demographic margins dataset and combined it with the Elections Project turnout dataset. The nice thing about these sources is that they cover elections every 2 years, so I can show results in between presidential election years.

To get state level results in non-presidential years, I aggregated the MIT Elections Lab house election results and combined these results with the Election Project state turnout data. Instead of comparing against a weighted average of the past three elections, the bubble tails are now just staggered by 4 years so that presidential and house election results are compared against each other.

There’s a lot to take in here, but I think the most striking thing is the increase in turnout and leftward shift in 2018. Blue wave indeed. This change seems largely to be driven by college educated voters. Another crucial thing to note here is that in 2018 the house vote crossed into Democratic territory in both Wisconsin (Rep -7.5%) and Arizona (Rep -1.71%). These will be crucial swing states in 2020 for both the presidency and senate, so things could get interesting.

Sources

[1] 2004-2008 County Voting data: https://github.com/helloworlddata/us-presidential-election-county-results

[2] 2005-2009 County VAP data: https://www.census.gov/programs-surveys/decennial-census/about/voting-rights/cvap.html

[3] 2012-2016 County Voting and VAP data: https://github.com/kyaroch/2012_and_2016_presidential_election_results_by_county

[4] United States Election Project, Demographic Turnout Data. http://www.electproject.org/home/voter-turnout/demographics

[5] United States Election Project, State Turnout Data. http://www.electproject.org/home/voter-turnout/voter-turnout-data

[6] American National Election Studies. https://electionstudies.org/data-center/anes-time-series-cumulative-data-file/

[7] Revisiting What Happened in the 2018 Election. Yair Ghitza. https://medium.com/@yghitza_48326/revisiting-what-happened-in-the-2018-election-c532feb51c0#_ftn1

[8] U.S. House 1976–2018. MIT MIT Election Data and Science Lab. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IG0UN2

[9] Code source for this post: psthomas/onevis: https://github.com/psthomas/onevis