I’m fascinated with creating a single visualization that can fully communicate the results of an election. I’ve made a few attempts in the past and even came up with a set of criteria for the ideal visualization. More or less, I want to answer the questions “What happened?”, “What changed?”, and “Why?” in a single visualization. Of course, nobody can fully answer the “Why?” question, but you can point towards an answer by looking at demographic and survey data.
So here’s my most recent attempt at an all-encompassing visualization for the 2016 election:
This might look like a little bit like an abstract painting, but I promise it’s informative. Each color category represents the entire electorate broken up into subgroups by age, housing density, education, gender, race, and state. The size of the bubble represents the population of the group, and the coordinate position represents the margin and turnout. The tail of each bubble represents the change from the weighted average (60/30/10) of the 2012, 2008, and 2004 elections.
If you want to find out what changed this election, just find the large bubbles with the long tails. And to find the crucial states in the electoral college, find the bubbles that just crossed over the y-axis. Note that this visualization is interactive so you can zoom in and mouseover for labels.
So the older age groups, males, high school graduates, and very low density areas all shifted rightwards in 2016. At the same time, turnout dropped substantially among the Non-Hispanic Black populations. These changes were enough to bring crucial states like WI, PA and MI just across the threshold into Republican territory. No new insights here, but I think it’s interesting to have a single graphic that communicates it all.
The real challenge with a plot like this is getting the data, especially the demographic data. The source for the turnout data in this plot is the Census Current Population Survey . The margin data for each demographic is from the American National Election Studies cumulative data file . Each of these sources is released in the spring of each year, so you need to wait for a few months to get a clear picture about what happened in an election. Catalist have announced a dataset that will be available more immediately after an election, so maybe I’ll use that in the future (although it’s still not possible to calculate turnout by subgroup using their data alone).
The eagle-eyed among you may have noticed a mathematical impossibility on the plot. The housing density subgroups all have a lower turnout than the sex subgroups. This isn’t possible, and it’s because the numbers are derived from two different sources of data. The housing density data is just from county level Citizen Voting Age Population data, while the sex data is from the Census Bureau’s Current Population Survey. If I calculate the expected national turnout based each group’s values, these are the results:
So obviously there’s some variation there, maybe we should just blame it on measurement error? Overall turnout was
55.4% of voting age population and
60.2% voting eligible population in 2016. So these numbers are in the right ballpark, but obviously they’re not perfect. All the code and data for this post are available here.
 2004-2008 County Voting data: https://github.com/helloworlddata/us-presidential-election-county-results
 2005-2009 County VAP data: https://www.census.gov/programs-surveys/decennial-census/about/voting-rights/cvap.html
 2012-2016 County Voting and VAP data: https://github.com/kyaroch/2012_and_2016_presidential_election_results_by_county
 United States Election Project. http://www.electproject.org/home/voter-turnout/demographics
 American National Election Studies. https://electionstudies.org/data-center/anes-time-series-cumulative-data-file/
 Revisiting What Happened in the 2018 Election. Yair Ghitza. https://medium.com/@yghitza_48326/revisiting-what-happened-in-the-2018-election-c532feb51c0#_ftn1
 Code source for this post: psthomas/onevis: https://github.com/psthomas/onevis