pstblogJekyll2023-03-11T21:16:51+00:00https://pstblog.com/Philip Thomashttps://pstblog.com/pthomas.v3@gmail.comhttps://pstblog.com/2022/08/09/elections-20222022-11-08T00:00:00-00:002022-08-09T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>My <a href="https://pstblog.com/2020/09/09/elections-meta">election model</a> from 2020 did a <a href="https://pstblog.com/2021/06/10/elections-eval">pretty good</a> job of focusing on critical states, so I decided to do something similar for the midterms this year. The general idea for this project is to combine forecasts for elections at every level of government into an index, which can then be used to prioritize campaign efforts.</p>
<p>This index takes into account things like the baseline political power held by a legislative body, how close the race will be for partisan control, and the tipping point probabilities for individual seats. I’ve made a number of improvements this year, including a better approach to quantifying categorical ratings and an adjustment for election frequencies. If you want more information on the details of the model, take a look at the “<a href="#appendix-a-how-it-works">How it Works</a>” section below. Otherwise, here are the results for 2022.</p>
<h2 id="election-forecasts">Election Forecasts</h2>
<p>This year, I use FiveThirtyEight’s <a href="https://projects.fivethirtyeight.com/2022-election-forecast/">forecasts</a> for the US House, Senate, and Governors races along with Louis Jacobson’s state legislative <a href="https://centerforpolitics.org/crystalball/articles/the-battle-for-the-state-legislatures/">ratings</a>. FiveThirtyEight does a good job of making their forecasts and histograms available on their website, so I just have to do some post-processing to incorporate their results.</p>
<p>It was a little more complex to quantify Louis Jacobson’s categorical ratings, but I was able to do so by comparing his historic ratings with the actual election outcomes at the state level since 2002 (see <a href="#appendix-b-state-legislative-analysis">below</a>). This allows me to make a point estimate of the two party seat share along with uncertainty bounds for each of his categorical ratings this year.</p>
<div id="vis"></div>
<!--https://medium.com/modex/css-tricks-expanding-beyond-a-parent-div-10d7b7204c21-->
<!--Note: flex wrapping isn't working for now, but just leave it, it's fine.-->
<style>
/*.wideDiv {
margin:25px 0px;
width:150%;
margin-left:-25%;
overflow:hidden;
}*/
.wideDiv {
margin:50px 0px;
width:100%;
overflow:hidden;
}
.flexContainer {
display:flex;
width:100%;
flex-wrap: wrap;
flex-direction: row;
/*max-height:200px;*/
}
.flexContainer > img {
display: block;
margin: auto;
/*flex:1;*/
/*border:1px solid;*/
/*margin:1px;*/
}
@media screen and (min-width: 600px) {
.flexContainer {
display: flex;
flex-wrap: wrap;
flex-direction: row;
}
.flexContainer > img {
width: 33.3333%; //-2rem
text-align: center;
}
#group-left {
margin-left: 20px
}
#group-right {
margin-right: 20px
}
.wideDiv {
margin:50px 0px;
width:140%;
margin-left:-20%;
overflow:hidden;
/* display: flex;
flex-wrap: wrap;
flex-direction: row;*/
}
}
</style>
<div class="wideDiv">
<div class="flexContainer"> <!--style="max-width:1200px;"-->
<img id="group-right" src="/images/elections-meta-2022/senatehist.png" />
<img id="group-left" src="/images/elections-meta-2022/househist.png" />
</div>
<div class="flexContainer"> <!--style="max-height:700px;max-width:1200px;"-->
<img id="group-right" src="/images/elections-meta-2022/senateseats.png" />
<img id="group-left" src="/images/elections-meta-2022/houseseats.png" />
</div>
<div class="flexContainer"><!--style="max-height:700px;max-width:1200px;"-->
<img src="/images/elections-meta-2022/govseats.png" />
<img src="/images/elections-meta-2022/statesenateseats.png" />
<img src="/images/elections-meta-2022/statehouseseats.png" />
</div>
</div>
<p>So as of writing this, it looks like it will be a tough election for Democrats. They’re unlikely to hold the House and the Senate is a tossup at this point. This is actually a better outcome than most expected a few months ago, so the Democratic Party at least has some positive momentum in its favor.</p>
<h2 id="power-values-by-state">Power Values by State</h2>
<p>It’s difficult to distill all of the election forecasts above into a set of priorities, but this is where my model comes in. I calculate a baseline power value for each office, then adjust it for how close the election is expected to be and how likely a seat is to be the tipping point for control if applicable. The general intuition here is that you should target elections that are likely to be close, and target legislative bodies where a change in partisan control is more likely.</p>
<p>So here are the resulting power values, grouped by state and office. Note that table is sortable, and you can adjust the weights if you prefer. By default I give equal weights to the state and federal governments. The results for each individual election are available <a href="https://github.com/psthomas/elections-meta/blob/master/analysis-2022/data/output/all_elections.csv">here</a>. Note that all the power values in the table below are just multiplied by 100 to make it easier to read.</p>
<iframe id="vis2" src="/images/elections-meta-2022/statescore-table.html" style="width:100%; height:1200px; border: none; position: relative; scrolling:no;">
</iframe>
<p>Aggregating these values at the state level makes sense to me because that’s the level that many of these elections are held. Whether you’re focused on persuading or turning out voters, you want to do it in a place where their vote counts across many close, important elections. This is currently the case in GA, PA, NV, AZ, WI – mainly driven by their races for governor and US Senate.</p>
<h2 id="funding-comparison">Funding Comparison</h2>
<p>This model is subjective so it’s hard to evaluate the performance, but one objective measure of our priorities is the level of funding each campaign receives. Maybe the wisdom of the crowds revealed through campaign donations can show us what our true priorities should be (assuming the crowds aren’t irrational or influenced by biased polling). So below I compare the percentage of power held by each office in my analysis with the actual funding allocations [7, 8]. The funding gap is the difference between the model priority and the fraction of election funding spent on the legislative body up to this point (a positive number is a bigger gap).</p>
<iframe src="/images/elections-meta-2022/power_frac.html" onload="javascript:(function(o){o.style.height=o.contentWindow.document.body.scrollHeight+30+'px';}(this));" style="height:200px;min-height:270px;width:100%;border:none;overflow:hidden;"></iframe>
<p>So it seems like the Senate races are actually underfunded relative to what my model predicts. As an outsider in his armchair, this seems like a misallocation but here are a few potential reasons for the difference:</p>
<ul>
<li>There might be a baseline fixed cost to running a campaign, and there are 435 House races compared to ~33 in the Senate.</li>
<li>This isn’t all the funding, just publicly disclosed funding. Maybe the undisclosed money is allocated differently.</li>
<li>Incumbency gives you a small advantage for future elections, so Dems might be playing the long game for control of the House.</li>
<li>There might be diminishing returns to more funding at this point, so the fraction of funding isn’t a meaningful metric currently (I think this is unlikely).</li>
</ul>
<p>There’s no fundamental reason my index should correlate perfectly with campaign funding – I’m just using it as a rough sanity check here. It’s interesting that these numbers are all in the same ballpark, but I would have to backtest my model against previous elections to establish a predictive relationship (assuming previous elections had the “right” priorities). But ignoring the index altogether, it’s still weird to see the <em>expected-blowout-House</em> getting 50% more funding that the <em>tossup-Senate</em> when they share similar levels of control over our government (and the Senate still controls judicial confirmations with a divided government).</p>
<p><strong>Update:</strong> I recently found a few sources of campaign-level funding data, so I included a table with the funding gaps for each race below. Note that this is a sum of all the FEC/State reported funding for both primary and general election candidates because I think this is a better measure of “total effort”. As of writing this, the Senate races still seem to be the ones with the largest funding gaps.</p>
<iframe id="vis3" src="/images/elections-meta-2022/office_table.html" style="width: 100%; height:700px; border:none;"></iframe>
<h2 id="conclusion">Conclusion</h2>
<p>So choose a state with many close, consequential elections (GA, PA, NV, AZ, WI), and find a way to volunteer or donate. My model prioritized North Carolina last election due to it’s many close elections, so I ended up volunteering with a few campaigns there. This year I think I’ll stay local to Wisconsin and focus on the races for Governor and US Senate.</p>
<h2 id="appendix-a-how-it-works">Appendix A: How it Works</h2>
<p>Here’s the general equation for calculating the power values:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">realized_power</span> <span class="o">=</span> <span class="n">potential_power</span> <span class="o">*</span> <span class="n">election_frequency</span> <span class="o">*</span> <span class="n">pr_close</span> <span class="o">*</span> <span class="n">pr_tip</span></code></pre></figure>
<p>To calculate the <code class="language-plaintext highlighter-rouge">potential_power</code> values, I create a hierarchical power sharing model. I begin with an arbitrary 100 points of power, and allocate half to the federal government and half to the states. The power at the federal level is then further subdivided between the president (25) and Congress (25), with the House and Senate dividing the congressional power evenly. The other 50 points of power are divided between the states according to their fraction of the national population. Each state’s value is then split between the governor and state legislatures just like at the federal level.</p>
<p>The next step is to adjust this potential power value for the <code class="language-plaintext highlighter-rouge">election_frequency</code> of each office or political body. Here’s the general intuition: If two offices hold the same level of power over our political system, but one is decided every two years and the other every four, you should put roughly double the resources into the election held every four years. So the presidency gets a multiplier of 4, but the US Senate gets a multiplier of 2 (note that this isn’t the same thing as a term length multiplier because control over the Senate is decided every 2 years, not every 6). This is a new adjustment to the model this year, based on my <a href="https://pstblog.com/2021/06/10/elections-eval">evaluation</a> of the results from 2020.</p>
<p>Then, I need to adjust these potential power values for how close the elections are expected to be. My approach is somewhat inspired by Andrew Gelman’s paper (<a href="http://www.stat.columbia.edu/~gelman/research/published/probdecisive2.pdf">PDF</a>) that calculates the probability of a single vote swinging the election. But instead of using the probability of a tie (which doesn’t make sense for an odd number of House seats or a governor’s race), I calculate the probability of a close election (<code class="language-plaintext highlighter-rouge">pr_close</code>). To do this, I calculate the fraction of election forecasts that land in a 5% interval around the central outcome. This is fairly easy to do this year because FiveThirtyEight releases their model histograms and I have a large pool of historical data to work with for the state legislatures.</p>
<p>The final step is to calculate the tipping point probability by seat (<code class="language-plaintext highlighter-rouge">pr_tip</code>). For the governors races this is easy because winning the two party popular vote is the tipping point (<code class="language-plaintext highlighter-rouge">pr_tip = 1</code>). But for the Senate or House, some seats are much more likely to decide control than others. Luckily, FiveThirtyEight calculates these tipping point probabilities for each seat so I just use their values. At the state legislative level, I calculate two party seat share so I don’t need to estimate a tipping point probability for these (it’s set to 1).</p>
<p>Then I just multiply. Because these values are derived from the same initial power distribution and I use a uniform approach to adjusting them, they should be comparable across elections.</p>
<h2 id="appendix-b-state-legislative-analysis">Appendix B: State Legislative Analysis</h2>
<p>State legislative elections don’t get anywhere near the same attention from forecasters as federal elections. In the past I used some general quantifications for categorical ratings from FiveThirtyEight, and just assumed an uncertainty bound around the rating for my model. But this year I stumbled across a <a href="https://doi.org/10.7910/DVN/EH6LU0">dataset</a> of state election forecasts reaching back to 2002 from Louis Jacobson. This allowed me to specifically quantify what each of the forecast categories means in terms of two party seat share, and also estimate how uncertain each of those bins are based on historical data. Here’s a resulting boxplot for each of the categories:</p>
<figure style="text-align:center;">
<a href="/images/elections-meta-2022/jacobson_bins.png">
<img src="/images/elections-meta-2022/jacobson_bins.png" />
</a>
</figure>
<p>Here’s a summary table that I used to quantify this year’s forecast, estimating democratic two party seatshare:</p>
<iframe src="/images/elections-meta-2022/jacobson_bins.html" onload="javascript:(function(o){o.style.height=o.contentWindow.document.body.scrollHeight+30+'px';}(this));" style="height:200px;min-height:270px;width:100%;border:none;overflow:hidden;"></iframe>
<p>These ratings seem to have done pretty well historically, and the quantifications follow roughly the step pattern that I’d expect. They’re not updated very often, but I think I’ll keep using them going forward. The code for the state analysis is in the <a href="https://github.com/psthomas/elections-meta">repo</a> along with the rest of the code for this project.</p>
<h2 id="references">References</h2>
<p>[1] Code and data for this post: psthomas/elections-meta. <a href="https://github.com/psthomas/elections-meta">https://github.com/psthomas/elections-meta</a></p>
<p>[2] Original post: Combining the 2020 Election Forecasts. <a href="https://pstblog.com/2020/09/09/elections-meta">https://pstblog.com/2020/09/09/elections-meta</a></p>
<p>[3] FiveThirtyEight Senate/House/Governor Forecasts. <a href="https://projects.fivethirtyeight.com/2022-election-forecast/">https://projects.fivethirtyeight.com/2022-election-forecast/</a></p>
<p>[4] The Battle for State Legislatures. Louis Jacobson, 2022. <a href="https://centerforpolitics.org/crystalball/articles/the-battle-for-the-state-legislatures/">https://centerforpolitics.org/crystalball/articles/the-battle-for-the-state-legislatures/</a></p>
<p>[5] Jacobson, Louis; Klarner, Carl; Oldham, Rob, 2020, “Louis Jacobson’s State Legislative Election Ratings (2002-2020)”, <a href="https://doi.org/10.7910/DVN/EH6LU0">https://doi.org/10.7910/DVN/EH6LU0</a>, Harvard Dataverse, V1.</p>
<p>[6] My state partisan composition dataset: <a href="https://github.com/psthomas/state-partisan-composition">https://github.com/psthomas/state-partisan-composition</a></p>
<p>[7] Federal campaign finance data, FEC. <a href="https://www.fec.gov/data/raising-bythenumbers/">https://www.fec.gov/data/raising-bythenumbers/</a></p>
<p>[8] State campaign finance data, followthemoney.org. <a href="https://www.followthemoney.org/">https://www.followthemoney.org/</a></p>
<p>[9] What is the probability your vote will make a difference? Andrew Gelman, Nate Silver.
<a href="http://www.stat.columbia.edu/~gelman/research/published/probdecisive2.pdf">http://www.stat.columbia.edu/~gelman/research/published/probdecisive2.pdf</a></p>
<p><a href="https://pstblog.com/2022/08/09/elections-2022">Election Priorities for the 2022 Midterms</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on August 09, 2022.</p>
https://pstblog.com/2021/06/10/elections-eval2021-06-10T00:00:00+00:002021-06-10T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>It’s been a little while since the 2020 elections so I finally got around to evaluating my election model. I’m hoping to run a similar model in the future, so this post is an attempt to learn what worked and what didn’t so I can improve on it next time. If you haven’t read my initial post on the subject, that will provide helpful context (especially the “How it Works” <a href="https://pstblog.com/2020/09/09/elections-meta#how">section</a>).</p>
<p>Overall, I think my model performed pretty well and it was correct to focus on the Senate in the final weeks of the campaign. But systematic polling bias against Republicans made it really tough for anyone to accurately estimate Democratic win probabilities. This led to a situation where the rank order of the power values was correct (in my opinion), but their magnitudes were off due to polling error.</p>
<p>This evaluation is organized into three parts. First, I compare the actual vs. projected results and calculate the forecasting error for each office. Next, I evaluate my approach for combining the projections by rerunning the models centered on the actual election outcomes. Finally, I compare these model priorities to the revealed preference of donors via campaign spending to get an objective measure of its performance.</p>
<h2 id="forecasting-error">Forecasting Error</h2>
<p>My model relies on a variety of different forecasts, ranging from simple judgment based ratings to complex quantitative forecasts. For the presidency, I relied on the Economist’s forecast which describes it’s methodology <a href="https://projects.economist.com/us-2020-forecast/president/how-this-works">here</a>. I used Cory McCartan’s forecast for the Senate, described <a href="https://corymccartan.github.io/projects/senate-20/">here</a>. For the rest of the legislative bodies, I quantified categorical ratings from Inside Elections and CNalysis using 538’s <a href="https://fivethirtyeight.com/features/2018-house-forecast-methodology/">categories</a>, and then sampled around those ratings using normal distributions to create basic forecasts.</p>
<p>Below, I recreated all of my pre-election plots with the actual results superimposed in green. At just about every level, the forecasts were biased in favor of Democrats:</p>
<style>
/*.wideDiv {
margin:25px 0px;
width:150%;
margin-left:-25%;
overflow:hidden;
}*/
.wideDiv {
margin:50px 0px;
width:100%;
overflow:hidden;
}
.flexContainer {
display:flex;
width:100%;
flex-wrap: wrap;
flex-direction: row;
/*max-height:200px;*/
}
.flexContainer > img {
display: block;
margin: auto;
/*flex:1;*/
/*border:1px solid;*/
/*margin:1px;*/
}
@media screen and (min-width: 600px) {
.flexContainer {
display: flex;
flex-wrap: wrap;
flex-direction: row;
}
.flexContainer > img {
width: 33.3333%; //-2rem
text-align: center;
}
.wideDiv {
margin:50px 0px;
width:140%;
margin-left:-20%;
overflow:hidden;
/* display: flex;
flex-wrap: wrap;
flex-direction: row;*/
}
}
</style>
<div class="wideDiv">
<div class="flexContainer">
<img src="/images/elections-meta/evaluation/presidentialhist.png" />
<img src="/images/elections-meta/evaluation/senatehist.png" />
<img src="/images/elections-meta/evaluation/househist.png" />
</div>
<div class="flexContainer">
<img src="/images/elections-meta/evaluation/presidentialstates.png" />
<img src="/images/elections-meta/evaluation/senateseats.png" />
<img src="/images/elections-meta/evaluation/houseseats.png" />
</div>
<div class="flexContainer">
<img src="/images/elections-meta/evaluation/govseats.png" />
<img src="/images/elections-meta/evaluation/statesenateseats.png" />
<img src="/images/elections-meta/evaluation/statehouseseats.png" />
</div>
</div>
<p>Here is a boxplot of the error by legislative body. Note that I’m only displaying the error for the elections that had close results (Democratic projected or actual voteshare of 40-60%). I explain more in appendix A below, but my model only needs accurate estimates for close elections so that’s my focus.</p>
<!--style="max-width:500px;"-->
<figure style="text-align:center;">
<a href="/images/elections-meta/evaluation/close_error.png">
<img src="/images/elections-meta/evaluation/close_error.png" />
</a>
</figure>
<p>And here’s a summary table showing the mean error (a measure of bias), and the mean absolute error for each office.</p>
<iframe src="/images/elections-meta/evaluation/close_error.html" onload="javascript:(function(o){o.style.height=o.contentWindow.document.body.scrollHeight+30+'px';}(this));" style="height:200px;min-height:270px;width:100%;border:none;overflow:hidden;"></iframe>
<p>So the presidential and Senate forecasts were biased in favor of Democrats by about 2%, but overall these offices had the least amount of error (their sophisticated methods helped!). The House was a little more mixed, with a bias of 4.9% towards Democrats, and a very large absolute error of 8%. And the state legislative estimates of Democratic seat share had the most pro-D bias (7%) and the most absolute error.</p>
<p>I think my main takeaway from this is that I need to come up with a better method of quantifying the categorical ratings. I used 538’s quantifications that were based on the historical performance of a number of different election ratings, but I probably need to use conversions tailored to the specific Inside Elections and CNalysis ratings I used instead. In some sense I’d expect the performance at the state level to be worse than the federal level because there’s much less polling and scrutiny of those elections, but I hoped to do a little better than this.</p>
<p>But coming up with better categories won’t do much to fix biases in the polling. <a href="https://centerforpolitics.org/crystalball/articles/poll-based-election-forecasts-will-always-struggle-with-uncertainty/">Polling error</a> seems to be the core reason that many of the forecasts, both <a href="https://statmodeling.stat.columbia.edu/2020/11/04/dont-kid-yourself-the-polls-messed-up-and-that-would-be-the-case-even-wed-forecasted-biden-losing-florida-and-only-barely-winning-the-electoral-college/">quantitative</a> and <a href="https://cookpolitical.com/analysis/national/national-politics/many-are-afraid-say-it-not-close-race">categorical</a>, were off. And partisan nonresponse bias <a href="https://www.vox.com/policy-and-politics/2020/11/10/21551766/election-polls-results-wrong-david-shor">was</a> probably the <a href="https://www.dataforprogress.org/memos/2020-polling-retrospective">major</a> source of polling error, so fixing this bias needs to be the focus of anyone hoping to forecast elections in the future.</p>
<h2 id="model-error">Model Error</h2>
<p>So how would my model perform if we had near perfect forecasts? One way to test this is to center the probability distributions for all the elections at the actual outcomes, and then rerun my model to see what it prioritizes. That’s exactly what I do below. Note that I didn’t have access to the underlying source code for the Economist’s forecast (and don’t understand McCartan’s Senate forecast well enough), so I had to rewrite my own Senate and presidential models. Here are summary histograms for the new presidential, Senate, and House simulations:</p>
<div class="wideDiv">
<div class="flexContainer">
<img src="/images/elections-meta/evaluation/model_presidentialhist.png" />
<img src="/images/elections-meta/evaluation/model_senatehist.png" />
<img src="/images/elections-meta/evaluation/model_househist.png" />
</div>
</div>
<p>So rerunning these models leads to a pretty dramatic shift in the histograms and Democratic win probabilities compared to the initial results. These legislative bodies that were highly likely to go to Democrats are now essentially tossups.</p>
<p>How does this influence the priorities of my model? Just as a reminder, here is how the realized power value is calculated: <code class="language-plaintext highlighter-rouge">realized_power = potential_power * pr_close * pr_tip</code>. So the shifts in the probability distributions above will affect the <code class="language-plaintext highlighter-rouge">pr_close</code> variables for each legislative body, and the changes in the expected outcomes for the individual seats will change their tipping point probabilities (<code class="language-plaintext highlighter-rouge">pr_tip</code>).</p>
<p>Here’s a table showing the results for each seat and the sum by state. The color coding communicates the change from the original 2020 model:</p>
<figure style="text-align:center;">
<a href="/images/elections-meta/evaluation/heatmap.png">
<img src="/images/elections-meta/evaluation/heatmap.png" />
</a>
</figure>
<p>Here are the resulting values grouped by office, and the change:</p>
<iframe src="/images/elections-meta/evaluation/office_summary.html" onload="javascript:(function(o){o.style.height=o.contentWindow.document.body.scrollHeight+30+'px';}(this));" style="height:200px;min-height:270px;width:100%;border:none;overflow:hidden;"></iframe>
<p>So the Senate has a similar power value as in the initial model, but relative importance of House and presidency have increased dramatically (3x). I find this result really interesting because the Senate histogram shifted just as much as that of the House and presidency. But because it shifted from one side of center to the other, the integrated value of <code class="language-plaintext highlighter-rouge">pr_close</code> around the center didn’t change very much. So I guess this could be interpreted as saying that the Senate races are still the most important, but you need to increase the resources put into the House and presidency relative to the Senate if the current allocation of funds are out of line with this model. But how do these numbers compare to the actual priorities of campaign funders?</p>
<h2 id="revealed-preference">Revealed Preference</h2>
<p>In some sense, my model is subjective (I choose the <code class="language-plaintext highlighter-rouge">potential_power</code> values for each office), so any evaluation will also be subjective. But there might be one objective measure of priorities that I can compare against: campaign funding. Maybe the wisdom of the crowds revealed through campaign donations can show us what our true priorities should be (assuming the crowds aren’t irrational or influenced by bad polling). So below I compare the percentage of power held by each office in the original 2020 analysis, the new recentered model, and the <a href="https://www.followthemoney.org/">actual</a> funding allocations.</p>
<iframe src="/images/elections-meta/evaluation/power_frac.html" onload="javascript:(function(o){o.style.height=o.contentWindow.document.body.scrollHeight+30+'px';}(this));" style="height:200px;min-height:270px;width:100%;border:none;overflow:hidden;"></iframe>
<p>I’m actually fairly surprised at the agreement here, at least for the downballot races. But there are major disagreements when it comes to the presidency and the Senate, and the original model magnifies this difference due to polling error. Here are a few reasons the estimates might not agree when it comes to the presidency:</p>
<ul>
<li>Donors were being irrational, and the correct thing to do was to donate more to the Senate races.</li>
<li>Donors were being rational, and the downside of another Trump term was so large that they were correct to focus on the presidency. Put another way, the <code class="language-plaintext highlighter-rouge">potential_power</code> values in my model were wrong.</li>
<li>There might be certain fixed costs for running a national presidential race that make it naturally more expensive.</li>
<li>Donors are “buying” power over the Senate for 2 years, but the presidency for 4. And they are only buying 1/3 of the Senate, but the full presidency.</li>
</ul>
<p>I’m not sure which of these explanations (if any) is correct, but I clearly need to do more thinking about what the exact implications of my model are for campaign funding.</p>
<h2 id="conclusion">Conclusion</h2>
<ul>
<li>Polling (and forecasts) were systematically biased against Republicans, probably due to partisan nonresponse bias. Using wide probability distributions can only do so much to fix this problem.</li>
<li>I need to come up with more systematic way of quantifying categorical ratings.</li>
<li>I need to think more deeply about what the implications of the model are for campaign funding.</li>
<li>Overall, I think the model was right to put an emphasis on the Senate in the end. Especially considering that 13k vote switches towards David Perdue in the Georgia Senate race would have prevented a runoff and denied Democrats unified control of the federal government. But while the rank order of the power values was probably correct, their magnitudes were off due to polling error. Finding a way to fix partisan nonresponse bias will be a major focus of pollsters and anyone else trying to forecast elections in 2022 and beyond.</li>
</ul>
<h2 id="appendix-a-all-the-forecasting-error">Appendix A: All the Forecasting Error</h2>
<p>Why did I only focus on the forecasting error for close elections? The main reason is that the error for blowout elections doesn’t matter very much – I just need to shift the seats far enough away from the center that they have low tipping point probabilities (or low <code class="language-plaintext highlighter-rouge">pr_close</code> when it comes to state legislatures). So it doesn’t make much difference if I estimate Democratic vote share at 30, 20, or 10% in these cases because their resulting <code class="language-plaintext highlighter-rouge">realized_power</code> values will be near zero regardless.</p>
<p>But here are the forecasting errors for all the elections for the sake of completeness:</p>
<figure style="text-align:center;">
<a href="/images/elections-meta/evaluation/all_error.png">
<img src="/images/elections-meta/evaluation/all_error.png" />
</a>
</figure>
<p>And here are some summary statistics for the graphic above. Interestingly, the mean error terms are smaller for the entire sample because the errors on both sides cancel out. But the absolute error values are still pretty similar:</p>
<iframe src="/images/elections-meta/evaluation/all_error.html" onload="javascript:(function(o){o.style.height=o.contentWindow.document.body.scrollHeight+30+'px';}(this));" style="height:200px;min-height:270px;width:100%;border:none;overflow:hidden;"></iframe>
<h2 id="references">References</h2>
<p>[1] Post Code and Data: psthomas/elections-meta. <a href="https://github.com/psthomas/elections-meta">https://github.com/psthomas/elections-meta</a></p>
<p>[2] Original post: Combining the 2020 Election Models. <a href="https://pstblog.com/2020/09/09/elections-meta">https://pstblog.com/2020/09/09/elections-meta</a></p>
<p>[3] Simple presidential model. Drew Linzer. <a href="https://twitter.com/DrewLinzer/status/1293216060456329216">https://twitter.com/DrewLinzer/status/1293216060456329216</a></p>
<p>[4] Poll-Based Election Forecasts Will Always Struggle With Uncertainty. Natalie Jackson. <a href="https://centerforpolitics.org/crystalball/articles/poll-based-election-forecasts-will-always-struggle-with-uncertainty/">https://centerforpolitics.org/crystalball/articles/poll-based-election-forecasts-will-always-struggle-with-uncertainty/</a></p>
<p>[5] Don’t kid yourself. The polls messed up—and that would be the case even if we’d forecasted Biden losing Florida and only barely winning the electoral college. Andrew Gelman. <a href="https://statmodeling.stat.columbia.edu/2020/11/04/dont-kid-yourself-the-polls-messed-up-and-that-would-be-the-case-even-wed-forecasted-biden-losing-florida-and-only-barely-winning-the-electoral-college/">https://statmodeling.stat.columbia.edu/2020/11/04/dont-kid-yourself-the-polls-messed-up-and-that-would-be-the-case-even-wed-forecasted-biden-losing-florida-and-only-barely-winning-the-electoral-college/</a></p>
<p>[6] Many Are Afraid To Say It, but This Is Not a Close Race. Charlie Cook. <a href="https://cookpolitical.com/analysis/national/national-politics/many-are-afraid-say-it-not-close-race">https://cookpolitical.com/analysis/national/national-politics/many-are-afraid-say-it-not-close-race</a></p>
<p>[7] One pollster’s explanation for why the polls got it wrong. Dylan Matthews. <a href="https://www.vox.com/policy-and-politics/2020/11/10/21551766/election-polls-results-wrong-david-shor">https://www.vox.com/policy-and-politics/2020/11/10/21551766/election-polls-results-wrong-david-shor</a></p>
<p>[8] Memo: 2020 Polling Retrospective. Data For Progress. <a href="https://www.dataforprogress.org/memos/2020-polling-retrospective">https://www.dataforprogress.org/memos/2020-polling-retrospective</a></p>
<p>[9] 2020 Senate Forecast. Cory McCartan. <a href="https://corymccartan.github.io/projects/senate-20/">https://corymccartan.github.io/projects/senate-20/</a></p>
<p>[10] Inside Elections House Ratings. Nathan Gonzales. <a href="http://insideelections.com/ratings/house">http://insideelections.com/ratings/house</a></p>
<p>[11] Inside Elections Governor Ratings. Nathan Gonzales <a href="http://insideelections.com/ratings/governor">http://insideelections.com/ratings/governor</a></p>
<p>[12] CNalysis State Legislature Ratings. Chaz Nuttycombe. <a href="https://www.cnalysiscom.website/">https://www.cnalysiscom.website/</a></p>
<p><a href="https://pstblog.com/2021/06/10/elections-eval">How did my election model perform?</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on June 10, 2021.</p>
https://pstblog.com/2020/09/09/elections-meta2020-11-03T00:00:00-00:002020-09-09T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<!--A Comprehensive Model of the 2020 Elections
A Metamodel for the 2020 Elections
I combine forecasts for every election to find strategic places to focus in 2020.-->
<p>This post is an introduction my new model of the 2020 elections, which is really a model of models. The general idea is to combine forecasts for elections at every level of government into an index, which can then be used to prioritize campaign efforts. I’ve made a <a href="https://pstblog.com/2019/10/10/voting-power-2020">few</a> <a href="https://pstblog.com/2019/03/05/voting-power-comprehensive">models</a> in the past that tried to find the most influential elections. But the main problem with those approaches was that they were retrospective and didn’t consider tipping point thresholds in their calculations.</p>
<p>This new model fixes both these problems by pulling together current forecasts for the 2020 presidential, Senate, House, governor, and state legislative elections. I then calculate a realized power value for each seat by adjusting a potential power value for the probability of a close election and the tipping point probability for each of the seats. There are more details on how the model works in the appendix below, but first the results.</p>
<h2 id="election-forecasts">Election Forecasts</h2>
<p>Here are the forecasts for the different offices and legislative bodies on the ballot in November. The <a href="https://projects.economist.com/us-2020-forecast/president">presidential</a> model is courtesy of The Economist, the <a href="https://corymccartan.github.io/projects/senate-20/">Senate</a> model is from Cory McCartan, the <a href="http://insideelections.com/ratings/house">House</a> and <a href="http://insideelections.com/ratings/governor">governor</a> point estimates are from Inside Elections, and the <a href="https://www.cnalysiscom.website/">state legislative</a> point estimates are from CNalysis. The confidence intervals are 95%, except for the Senate which reports a 90% confidence interval.</p>
<div id="vis"></div>
<!--https://medium.com/modex/css-tricks-expanding-beyond-a-parent-div-10d7b7204c21-->
<!--Note: flex wrapping isn't working for now, but just leave it, it's fine.-->
<style>
/*.wideDiv {
margin:25px 0px;
width:150%;
margin-left:-25%;
overflow:hidden;
}*/
.wideDiv {
margin:50px 0px;
width:100%;
overflow:hidden;
}
.flexContainer {
display:flex;
width:100%;
flex-wrap: wrap;
flex-direction: row;
/*max-height:200px;*/
}
.flexContainer > img {
display: block;
margin: auto;
/*flex:1;*/
/*border:1px solid;*/
/*margin:1px;*/
}
@media screen and (min-width: 600px) {
.flexContainer {
display: flex;
flex-wrap: wrap;
flex-direction: row;
}
.flexContainer > img {
width: 33.3333%; //-2rem
text-align: center;
}
.wideDiv {
margin:50px 0px;
width:140%;
margin-left:-20%;
overflow:hidden;
/* display: flex;
flex-wrap: wrap;
flex-direction: row;*/
}
}
</style>
<!-- <div style="margin:25px 0px;width:85vw;position:relative;left: calc(-42.5vw + 50%);"> -->
<!-- <div style="margin:25px 0px;width:85vw;max-width:1250px;position:relative;left: calc(-42.5vw + 50%);"> -->
<div class="wideDiv">
<div class="flexContainer"> <!--style="max-width:1200px;"-->
<img src="/images/elections-meta/presidentialhist.png" />
<img src="/images/elections-meta/senatehist.png" />
<img src="/images/elections-meta/househist.png" />
</div>
<div class="flexContainer"> <!--style="max-height:700px;max-width:1200px;"-->
<img src="/images/elections-meta/presidentialstates.png" />
<img src="/images/elections-meta/senateseats.png" />
<img src="/images/elections-meta/houseseats.png" />
</div>
<div class="flexContainer"><!--style="max-height:700px;max-width:1200px;"-->
<img src="/images/elections-meta/govseats.png" />
<img src="/images/elections-meta/statesenateseats.png" />
<img src="/images/elections-meta/statehouseseats.png" />
</div>
</div>
<p>So overall things look pretty good for the Democrats right now, but the Senate is still expected to be close. One other thing to mention is that these aren’t nowcasts, they’re forecasts. Each one of these estimates takes into account how the race is expected to tighten from now until November, either through sophisticated modeling techniques in the case of the presidential and Senate models, or through expert judgement in the case of the categorical models. There’s an ongoing debate about how confident we should be in forecasts in general, but these models are at least trying to incorporate the best practices.</p>
<h2 id="power-values-by-state">Power Values by State</h2>
<p>It’s difficult to distill all of the election forecasts above into a set of priorities, but this is where my model comes in. I calculate a baseline power value for each office, then adjust it for how close the election is expected to be and how likely a seat is to be the tipping point for control if applicable. The general intuition here is that you should target elections that are likely to be close, and target legislative bodies where a change in partisan control is more likely.</p>
<p>So here are the resulting power values, grouped by state and office. Note that table is sortable, and you can adjust the weights if you prefer. By default I give equal weights to the state and federal governments. The results for each individual election sorted by realized power are available <a href="https://github.com/psthomas/elections-meta/blob/master/analysis-2020/data/output/seat_realized_power.csv">here</a>).</p>
<iframe id="vis2" src="/images/elections-meta/statescore-table.html" style="width:100%; height:1200px; border: none; position: relative; scrolling:no;">
</iframe>
<h2 id="conclusion">Conclusion</h2>
<p>So if you trust these models and weights, North Carolina seems like a good place to focus right now. It has close elections at every level of government and has a potential tipping point seat for the US Senate. Otherwise, if you think Trump represents a unique threat to our democracy and the risk adjusted harm of another term outweighs any other considerations, sort or reweight by the presidential column and focus on those states (currently PA, FL, MI, WI, MN, NC). Or maybe you think the president’s policy decisions have more tail risk in general, so this office should always have more weight. Whatever your preference, the goal of this model isn’t to give you an absolute answer, but instead give you results that can be adjusted based on your own assumptions.</p>
<p>I’ll try to keep these results up to date as new versions of the supporting models are released. All the code and data for this project are available on GitHub <a href="https://github.com/psthomas/elections-meta">here</a>.</p>
<h2 id="how">Appendix: How it Works</h2>
<p>To start off, I create a hierarchical power sharing model. I begin with an arbitrary 100 points of power, and allocate half to the federal government and half to the states. The power at the federal level is then further subdivided between the president (25) and Congress (25), with the House and Senate dividing the congressional power evenly. The other 50 points of power are divided between the states according to their fraction of the national population. Each state’s value is then split between the governor and state legislatures just like at the federal level.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">total_power</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">federal_power</span> <span class="o">=</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">total_power</span>
<span class="n">presidential_power</span> <span class="o">=</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">federal_power</span>
<span class="n">senate_power</span> <span class="o">=</span> <span class="mf">0.25</span><span class="o">*</span><span class="n">federal_power</span>
<span class="n">house_power</span> <span class="o">=</span> <span class="mf">0.25</span><span class="o">*</span><span class="n">federal_power</span>
<span class="n">states_power</span> <span class="o">=</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">total_power</span>
<span class="n">governor_power</span> <span class="o">=</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">states_power</span>
<span class="n">state_senate_power</span> <span class="o">=</span> <span class="mf">0.25</span><span class="o">*</span><span class="n">states_power</span>
<span class="n">state_house_power</span> <span class="o">=</span> <span class="mf">0.25</span><span class="o">*</span><span class="n">states_power</span> </code></pre></figure>
<p>Next, I need to adjust these potential power values for how close the elections are expected to be. My approach is somewhat inspired by Andrew Gelman’s paper (<a href="http://www.stat.columbia.edu/~gelman/research/published/probdecisive2.pdf">PDF</a>) that calculates the probability of a single vote swinging the election. But instead of using the probability of a tie (which doesn’t make sense for an odd number of House seats or a governor’s race), I calculate the probability of a close election (<code class="language-plaintext highlighter-rouge">pr_close</code>). To do this, I fit a kernel density estimate to the simulated outputs for each model, and integrate 5% of the interval around the central outcome. So, in the case of the presidency, I fit a kernel density to the modeled electoral college outcomes, then integrate it from 255.55 to 282.45 electoral votes, which gives me a rough probability of the outcome falling within that interval. For the governors races, I just integrate the kernel density from 47.5% to 52.5% of the two party popular vote.</p>
<p>The next step is to calculate the tipping point probability by seat (<code class="language-plaintext highlighter-rouge">pr_tip</code>). For the governors races this is easy because winning the two party popular vote is the tipping point (<code class="language-plaintext highlighter-rouge">pr_tip = 1</code>). But in the case of the electoral college, some states are much more likely to put a candidate over the critical threshold of 269 electoral votes than others. So over the thousands of presidential simulations, I sort the states by vote margin for the winner of the electoral college, and then find the state that put the winner over the threshold. I use a similar approach to find the tipping point seats for the House, and the Senate model has these probabilities precalculated. At the state legislative level, I model two party seat share, so I don’t need to calculate a tipping point probability for these (it’s set to 1).</p>
<p>Note that I’m calculating the tipping point probabilities using all the simulations, not just the close ones. I’ve seen other approaches that only use close elections, but this produces results for me that are out of line with <a href="https://twitter.com/gelliottmorris/status/1280977710676766723">other</a> <a href="https://projects.fivethirtyeight.com/2020-election-forecast/">estimates</a>, so I’ll stick with using all the simulations for now. So now that I have all the values I need for every legislative body and seat, I can plug them into this equation to get my final adjusted values:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">realized_power</span> <span class="o">=</span> <span class="n">potential_power</span> <span class="o">*</span> <span class="n">pr_close</span> <span class="o">*</span> <span class="n">pr_tip</span></code></pre></figure>
<p>Because these values are derived from the same initial power distribution and I use a uniform approach to adjusting them, they should be comparable across elections. Note that all the power values in the table above are just multiplied by 100 to make it easier to read.</p>
<h3 id="supporting-models">Supporting Models</h3>
<p>The Economist’s <a href="https://projects.economist.com/us-2020-forecast/president">presidential model</a> and Cory McCartan’s <a href="https://corymccartan.github.io/projects/senate-20/">Senate model</a> share their complete simulation results. So I just need to import those and do some processing to calculate the above values.</p>
<p>The House is a little more complicated because nobody is (was) doing a quantitative model for those elections by seat. So I take the categorical <a href="http://insideelections.com/ratings/house">ratings</a> for House seats from Inside Elections, convert the categorical ratings to numerical two party vote margins using 538’s <a href="https://fivethirtyeight.com/features/2018-house-forecast-methodology/">quantifications</a>, and then sample around these margins using normal distributions. This approach is inspired by Drew Linzer’s simple <a href="https://twitter.com/DrewLinzer/status/1293216060456329216">presidential model</a>, where he models polling uncertainty using a normal distribution centered at the polling average with a standard deviation of 0.05, and estimates a national swing using a normal distribution with a standard deviation of 0.03. I’m a novice when it comes to modeling elections though, so feedback on this approach is welcome.</p>
<p>I model the governors races using a similar approach to the House above, I take the <a href="http://insideelections.com/ratings/governor">ratings</a> from Inside Elections, quantify them, and then sample around them with normal distributions. For state legislatures, it’s a little bit beyond me to model each individual seat. So instead I model the expected Democratic seats for each body. I take the state legislative <a href="https://www.cnalysiscom.website/">ratings</a> from CNalysis, quantify them, convert the percentage margin to a seat count, then sample around the expected seat count using the same uncertainty as above, just scaled to the number of seats in each state legislative body. I then integrate all of these simulated distributions over the same 5% interval to calculate <code class="language-plaintext highlighter-rouge">pr_close</code> as mentioned above.</p>
<h1 id="references">References</h1>
<p>[1] What is the probability your vote will make a difference? Andrew Gelman, Nate Silver.
<a href="http://www.stat.columbia.edu/~gelman/research/published/probdecisive2.pdf">http://www.stat.columbia.edu/~gelman/research/published/probdecisive2.pdf</a></p>
<p>[2] Simple presidential model. Drew Linzer. <a href="https://twitter.com/DrewLinzer/status/1293216060456329216">https://twitter.com/DrewLinzer/status/1293216060456329216</a></p>
<p>[3] How FiveThirtyEight’s House Model Works, 2018. <a href="https://fivethirtyeight.com/features/2018-house-forecast-methodology/">https://fivethirtyeight.com/features/2018-house-forecast-methodology/</a></p>
<p>[4] Forecasting the US elections, President. The Economist. <a href="https://projects.economist.com/us-2020-forecast/president">https://projects.economist.com/us-2020-forecast/president</a></p>
<p>[5] 2020 Senate Forecast. Cory McCartan. <a href="https://corymccartan.github.io/projects/senate-20/">https://corymccartan.github.io/projects/senate-20/</a></p>
<p>[6] Inside Elections House Ratings. Nathan Gonzales. <a href="http://insideelections.com/ratings/house">http://insideelections.com/ratings/house</a></p>
<p>[7] Inside Elections Governor Ratings. Nathan Gonzales <a href="http://insideelections.com/ratings/governor">http://insideelections.com/ratings/governor</a></p>
<p>[8] CNalysis State Legislature Ratings. Chaz Nuttycombe. <a href="https://www.cnalysiscom.website/">https://www.cnalysiscom.website/</a></p>
<p>[9] Post Code and Data: psthomas/elections-meta. <a href="https://github.com/psthomas/elections-meta">https://github.com/psthomas/elections-meta</a></p>
<p><a href="https://pstblog.com/2020/09/09/elections-meta">Combining the 2020 Election Forecasts</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on September 09, 2020.</p>
https://pstblog.com/2020/06/12/covid-prob2020-06-12T00:00:00+00:002020-06-12T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>At first I thought this would be fairly simple question to answer. Just sum up the new cases over the past few weeks and divide by the total population of each region, right? I took a similar approach in my <a href="https://pstblog.com/2020/06/10/covid-vis">previous post</a>, but this probability is actually much more difficult to estimate for a few reasons outlined in <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7073841/">this paper</a>:</p>
<ol>
<li>There’s a 10 day lag between an infection and a reported case and a 20 day lag between an infection and death on average. This means the counts we see today reflect the past. The effect of this lag on the numbers depends on the growth rate of the pandemic at the time.</li>
<li>Roughly 35-40% of cases are <a href="https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html">asymptomatic</a>. These cases will never show up in the numbers unless we do random testing.</li>
<li>Even among symptomatic people, a large fraction (right now <a href="https://cmmid.github.io/topics/covid19/global_cfr_estimates.html">estimated</a> at 65% for the US) will never have a positive test. Perhaps they don’t seek one out, one isn’t available, or they have a <a href="https://www.acpjournals.org/doi/10.7326/M20-1495">false negative</a> result.</li>
<li>The availability of testing <a href="https://fivethirtyeight.com/features/coronavirus-case-counts-are-meaningless/">influences</a> many of the numbers above.</li>
</ol>
<p>The end result is that it’s probably more accurate to estimate the true number of infections using a <a href="https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology#The_SEIR_model">SEIR model</a> that matches the reported deaths, rather than back-calculating infections from reported cases. This is what the <a href="https://covid19-projections.com/about/">model</a> I ended up choosing does; see the appendix below for more information.</p>
<!-- So after that lengthy introduction, here's the probability that a person is infected by country:
<div class="outer" style="padding-left:25px">
<div class="inner">
<iframe id="vis1" src="/vis/covid-world-model.html"
style="width: 1020px; height: 650px; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div> -->
<p>So after that lengthy introduction, here’s the probability that a person is infected by state in the US. <strong>Note:</strong> As of October 5th this model will no longer be updated, so I’m looking for a replacement currently. If you want a simple probability estimate by region, see my <a href="/2020/06/10/covid-vis">previous post</a> on the subject.</p>
<div class="outer" style="padding-left:25px">
<div class="inner">
<iframe id="vis2" src="/vis/covid-us-model.html" style="width: 1000px; height: 650px; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div>
<!-- <iframe id="vis3" src="/vis/covid-us-model.html" style="width: 900px; height:600px; border:none;"></iframe> -->
<!-- <div id="vis3" style="width: 100%; height:600px;"> </div> -->
<p>This model also has results for some counties that contain major urban areas, so here’s the probability of infection by US county:</p>
<div class="outer" style="padding-left:25px">
<div class="inner">
<iframe id="vis3" src="/vis/covid-county-model.html" style="width: 1000px; height: 650px; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div>
<!-- <iframe id="vis4" src="/vis/covid-county-model.html" style="width: 100%; height:600px; border:none;"></iframe> -->
<!-- <div id="vis4" style="width: 100%; height:600px;"> </div> -->
<p>The model also has estimates for a number of countries, but I haven’t found those results to be as reliable as the US focused models, so I’m leaving them out for now.</p>
<p>Note that although the peak for New York state above is a little over 9%, the peak for the five boroughs of New York City is over 15%. This means that if you attended a meeting with 10 random people at the peak of the outbreak, there was a 80% chance someone attending was infected (<a href="https://blogs.scientificamerican.com/observations/online-covid-19-dashboard-calculates-how-risky-reopenings-and-gatherings-can-be/"><code class="language-plaintext highlighter-rouge">1-(1-p)^n</code></a><code class="language-plaintext highlighter-rouge">= 1-(1-0.15)^10 = 0.803</code>). One thing to add is that the probability someone is <em>infected</em> isn’t neccessarily equal to the probability they’re <em>infectious</em> – there may be a smaller window of time that someone can actually spread the infection but that’s still uncertain for now.</p>
<p>Sometimes a table is the best way to visualize data, so here’s a searchable table with data sorted by probability:</p>
<!-- <div id="covidtable" style="margin:30px auto;height: 600px; max-width:775px; overflow: auto;"></div> -->
<!-- <div class="outer" style="padding: 0px 15px">
<div class="inner"> -->
<iframe id="vis3" src="/vis/covid-probtable.html" style="width: 100%; height:600px; border:none;"></iframe>
<!-- </div>
</div> -->
<p>The idea here is that people can use these probabilies to estimate the risk of their lifestyle given their location. So if there’s a 1% chance that someone is infected in your region, attending a meeting with 10 individuals means there’s a 9.5% chance of getting exposed to a person with Covid-19 (<code class="language-plaintext highlighter-rouge">1-(1-0.01)^10 = 0.095</code>). Of course, this approach could backfire. Here are some potential problems:</p>
<ul>
<li>This approach requires some math, but creating a risk calculator similar to <a href="https://covid19risk.biosci.gatech.edu/">this one</a> could help.</li>
<li>People could just be really bad at estimating how many people they interact with.</li>
<li>The <a href="https://twitter.com/firefoxx66/status/1260905937910587392">spread via aerosols</a> or surfaces could make estimating the number of interactions impossible.</li>
<li>County level data isn’t available for every urban area, so people may underestimate their risk by using statewide risk estimates.</li>
</ul>
<p>But I think this visualization provides people with more actionable information than others I’ve seen, so I decided to put it out there. I’ll try to update it as often as possible when new data is available. If you want to embed these visualizations elsewhere, please let me know because I could host them on Amazon S3 or something. All the code for this post is available on GitHub <a href="https://github.com/psthomas/covid-vis">here</a>.</p>
<h2 id="appendix-model-selection">Appendix: Model Selection</h2>
<p>There a a number of models that try to predict the course of the pandemic, most of which are compiled on the Reich Lab <a href="https://reichlab.io/covid19-forecast-hub/">forecasting hub</a> and <a href="https://projects.fivethirtyeight.com/covid-forecasts/">FiveThirtyEight</a>. But the only ones that include an estimate of the true number of infections over time are the models created by <a href="https://covid19.healthdata.org/united-states-of-america">IHME</a>, <a href="https://cuepi.shinyapps.io/COVID-19/">Columbia</a>, <a href="https://mrc-ide.github.io/covid19usa/#/">Imperial College</a>, and <a href="https://covid19-projections.com/">Youyang Gu</a>.</p>
<p>First, I looked at IHME’s model, but immediately something seemed off. Here’s the predicted cases for Wisconsin during a time when cases were increasing:</p>
<figure style="text-align:center">
<a href="/images/covidvis/ihme-model.png"><img src="/images/covidvis/ihme-model.png" /></a>
</figure>
<p>Predicted infections are actually lower than measured infections, something that would only happen due to testing lag when cases were declining significantly. This could be because their initial model <a href="https://twitter.com/CT_Bergstrom/status/1250304081265963010">fit a Gaussian curve</a> to the data, which forced a symmetric increase and decline around a peak. While I don’t think this is their approach anymore, everything still seemed to be asymptoting towards zero when I reviewed it, so that doesn’t inspire much confidence.</p>
<p>The Columbia and Imperial models do a better job, but they’re not updated as frequently as I’d like. Youyang Gu’s model is updated daily, performs really well in predictions, and has <a href="https://twitter.com/CT_Bergstrom/status/1255343846445195266">good reviews</a> from subject matter experts, so I decided to use it. But I still wanted to do a few checks to validate it. First, I compared it to Imperial College’s model:</p>
<figure style="text-align:center">
<a href="/images/covidvis/concordance.png"><img src="/images/covidvis/concordance.png" /></a>
</figure>
<p>If they agreed perfectly, their estimates would sit on a 45 degree line. So there’s some deviation, especially in the high case counts. One way to quantify this deviation is using a <a href="https://en.wikipedia.org/wiki/Concordance_correlation_coefficient">concordance correlation coefficient</a>, which ends up being <code class="language-plaintext highlighter-rouge">0.917</code>. Perfect concordance would give a value of <code class="language-plaintext highlighter-rouge">1.0</code>, so this is pretty good.</p>
<p>Next, I compared it against recent serology tests in Spain, which <a href="https://www.vox.com/2020/5/16/21259492/covid-antibodies-spain-serology-study-coronavirus-immunity">suggests 5%</a> of the population has been infected so far. If we just sum up the reported case counts as of 5/13/2020 and divide by the Spanish population, it gives an estimated percent infected of <code class="language-plaintext highlighter-rouge">0.5%</code>, which is a 10x undercount. But if we sum up the predicted infections from Youyang’s model and divide by the population, we get an estimated <code class="language-plaintext highlighter-rouge">6.7%</code> of the population infected. So this is at least at the right order of magnitude, and could be correct depending on when the serology study actually ended.</p>
<p>New York state also completed a <a href="https://twitter.com/NYGovCuomo/status/1253352837255438338">serology study</a> on April 23rd that estimated a New York City infection rate of <code class="language-plaintext highlighter-rouge">21.2%</code> and a statewide rate of <code class="language-plaintext highlighter-rouge">13.9%</code>. The model predictions of <code class="language-plaintext highlighter-rouge">20.9%</code> in the city and <code class="language-plaintext highlighter-rouge">12.5%</code> statewide as of 4/23/2020 are quite close. So overall this seems like a quality model and I’ll probably continue using it.</p>
<p><strong>Update:</strong> After observing these probabilities for the past few weeks, one thing I’ve noticed is that even these tend to lag during a new outbreak. I think this is because the model does a grid search and finds the optimal model paramaters that minimize the error on the deaths timeseries. But if these parameters are optimized to match the death timeseries from a month ago, running the model forward to today will still underestimate the cases during an exponential growth phase because it’s using old parameters.</p>
<p>Another interesting problem with this approach is that if the pandemic starts spreading more in young populations like it did recently, you’ll underestimate the actual number of cases because the deaths timeseries won’t be increasing as much. I’m not sure how to fix these issues, but what’s really needed is a robust way to estimate total cases using the current cases timeseries. Let me know if you’re aware of an epidemiologist that’s doing this!</p>
<h2 id="references">References</h2>
<p>[1] COVID-19 Projections Using Machine Learning. <a href="https://covid19-projections.com/about/">https://covid19-projections.com/about/</a></p>
<p>[2] Communicating the Risk of Death from Novel Coronavirus Disease (COVID-19). <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7073841/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7073841/</a></p>
<p>[3] COVID-19 Pandemic Planning Scenarios. <a href="https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html">https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html</a></p>
<p>[4] Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction–Based SARS-CoV-2 Tests by Time Since Exposure. <a href="https://www.acpjournals.org/doi/10.7326/M20-1495">https://www.acpjournals.org/doi/10.7326/M20-1495</a></p>
<p>[5] Using a delay-adjusted case fatality ratio to estimate under-reporting. <a href="https://cmmid.github.io/topics/covid19/global_cfr_estimates.html">https://cmmid.github.io/topics/covid19/global_cfr_estimates.html</a></p>
<p>[6] Inferring cases from recent deaths. <a href="https://cmmid.github.io/topics/covid19/cases-from-deaths.html">https://cmmid.github.io/topics/covid19/cases-from-deaths.html</a></p>
<p>[7] Coronavirus Case Counts Are Meaningless. <a href="https://fivethirtyeight.com/features/coronavirus-case-counts-are-meaningless/">https://fivethirtyeight.com/features/coronavirus-case-counts-are-meaningless/</a></p>
<p>[8] Reich Lab COVID-19 Forecast Hub. <a href="https://reichlab.io/covid19-forecast-hub/">https://reichlab.io/covid19-forecast-hub/</a></p>
<p>[9] Where The Latest COVID-19 Models Think We’re Headed — And Why They Disagree. <a href="https://projects.fivethirtyeight.com/covid-forecasts/">https://projects.fivethirtyeight.com/covid-forecasts/</a></p>
<p>[10] The results of a Spanish study on Covid-19 immunity have a scary takeaway. <a href="https://www.vox.com/2020/5/16/21259492/covid-antibodies-spain-serology-study-coronavirus-immunity">https://www.vox.com/2020/5/16/21259492/covid-antibodies-spain-serology-study-coronavirus-immunity</a></p>
<p>[11] Online COVID-19 Dashboard Calculates How Risky Reopenings and Gatherings Can Be. <a href="https://blogs.scientificamerican.com/observations/online-covid-19-dashboard-calculates-how-risky-reopenings-and-gatherings-can-be/">https://blogs.scientificamerican.com/observations/online-covid-19-dashboard-calculates-how-risky-reopenings-and-gatherings-can-be/</a></p>
<p>[12] COVID-19 Event Risk Assessment Planning tool. <a href="https://covid19risk.biosci.gatech.edu/">https://covid19risk.biosci.gatech.edu/</a></p>
<p><a href="https://pstblog.com/2020/06/12/covid-prob">What's the probability a person has Covid-19?</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on June 12, 2020.</p>
https://pstblog.com/2020/06/10/covid-vis2020-06-10T00:00:00+00:002020-06-10T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<!--Visualizing the Covid-19 Pandemic-->
<!--Visualizing the Coronavirus Pandemic-->
<!--What's the probability a person has Covid-19?-->
<p>I’ve held off on creating a visualization of the pandemic so far because I think things have been <a href="https://www.ft.com/content/a26fbf7e-48f8-11ea-aeb3-955839e06441">covered well</a> by <a href="https://ourworldindata.org/coronavirus">others</a>. But there are still a few visualizations that I’d like to see, so I decided to give it a try. When I look at a Covid-19 visualization, I generally want to answer a few questions:</p>
<ul>
<li><strong>How bad is it?</strong> Total numbers are needed here to show the human toll.</li>
<li><strong>Is it getting worse?</strong> Time series data is helpful here to put current data in context. Preferably show new cases rather than total cases on the y-axis so we can quickly see increases.</li>
<li><strong>How effective is our response?</strong> Per capita numbers are useful here because they allow us to see how well governments are responding relative to each other.</li>
<li><strong>How risky are things for me?</strong> Per capita numbers are also helpful here because they’re proportional to individual risk. But individual risk can also be calculated using a model, which I do in my <a href="https://pstblog.com/2020/06/12/covid-prob">next post</a>. Either way, the numbers have to be at a high enough resolution (e.g. state or county) to matter for individuals.</li>
</ul>
<p>So I created a few visualizations to try to answer these questions. And just to get this out of the way at the start, I want to stress that I am not an epidemiologist. But for the most part I’m communicating existing information rather than creating it, so this shouldn’t be too much of a problem.</p>
<h2 id="data-sources">Data Sources</h2>
<p>For the international timeseries, I’m using the <a href="https://github.com/owid/covid-19-data/tree/master/public/data">data</a> compiled by <a href="https://ourworldindata.org/coronavirus">Our World In Data</a>. The state and county level results are courtesy of the New York Times <a href="https://github.com/nytimes/covid-19-data">datasets</a>. These visualizations wouldn’t be possible without the tireless work of these groups, so I appreciate their effort.</p>
<p>Of course, these sources aren’t perfect and some leaders have decided to <a href="https://www.reuters.com/article/us-health-coronavirus-brazil-idUSKBN23D0PW">bury</a> <a href="https://www.theguardian.com/world/2020/may/14/coronavirus-russia-defends-its-exceptionally-precise-covid-19-death-data">their</a> <a href="https://www.usatoday.com/story/news/nation/2020/05/19/florida-covid-19-coronavirus-data-researcher-out-state-reopens/5218897002/">heads</a> in the <a href="https://www.businessinsider.com/trump-says-too-much-coronavirus-testing-makes-us-look-bad-2020-5">sand</a> rather than mount an effective response. There’s not a lot that can be done about the suppression of data, so I’ll just have to live with the results I have for now. All of my code for this post is available <a href="https://github.com/psthomas/covid-vis">here</a>.</p>
<h2 id="population-adjusted-timeseries">Population Adjusted Timeseries</h2>
<p>The first two visualizations are just timeseries plots showing new cases per million on the y-axis. The bubble size and color represent the total deaths for each place. I think this does a good job of communicating both the current state of things and the cumulative toll.</p>
<p>First, here are the global results. Note that each plot is interactive with tooltips and scroll to zoom enabled:</p>
<div class="outer">
<div class="inner">
<iframe id="vis1" src="/vis/covid-world.html" style="width: 1200px; height: 650px; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div>
<!-- <iframe id="vis" src="/vis/covid-world.html" style="width: 100%; height:600px; border:none;"></iframe> -->
<!-- <div id="vis" style="width: 100%; height:600px;"> </div> -->
<p>And here are the results by state in the US:</p>
<div class="outer">
<div class="inner">
<iframe id="vis2" src="/vis/covid-us.html" style="width: 1200px; height: 650px; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div>
<!-- <iframe id="vis2" src="/vis/covid-us.html" style="width: 100%; height:600px; border:none;"></iframe> -->
<!-- <div id="vis2" style="width: 100%; height:600px;"> </div> -->
<p>And here are the results for the top 100 counties in the US:</p>
<div class="outer" style="padding-left:25px">
<div class="inner">
<iframe id="vis2" src="/vis/covid-county.html" style="width: 1200px; height: 650px; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div>
<p>I think this county visualization shows why it’s so important to have data at a sub-state resolution. At the time of writing, my home state of Wisconsin has 150 new cases per million per day, but my county of Milwaukee County has 300 new cases per million per day. So things are twice as risky here and on par with the state of Texas, but nobody is communicating this risk! The media certainly aren’t reporting on Milwaukee with the same level of alarm as Texas, but they should be.</p>
<p>If you want to look up your own county, here’s a searchable/sortable table with all of the country, state, and top 250 county results. Note that there’s an additional column in this table called <code class="language-plaintext highlighter-rouge">simple_probability</code>, which is the probability that a person is infected for the region. This column is created by summing up the number of new cases over the past ten days and <a href="https://twitter.com/trvrb/status/1249414308355649536">multiplying</a> <a href="https://www.nytimes.com/2020/07/21/health/coronavirus-infections-us.html">by</a> <a href="https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/commercial-lab-surveys.html">ten</a>. This is a huge simplification for a number of reasons I get into in my <a href="https://pstblog.com/2020/06/12/covid-prob">next post</a>. But in the absence of any other source for this risk estimate, I’ll keep providing it as a back of the envelope estimate. Just know that this probability will be an underestimate during the steepest growth of new cases, and an overestimate when new cases are flat or declining.</p>
<p><strong>Update</strong>: Newer estimates put the correct multiplier in the <a href="https://twitter.com/trvrb/status/1292260286934605824">4-8x range</a>, but I’m going to keep using a 10x multiplier. This is because 10x will probably still be right in <a href="https://twitter.com/DiseaseEcology/status/1291907448404471808">certain contexts</a> like the exponential growth phase, and there’s probably more <a href="https://www.seattletimes.com/nation-world/cdcs-overblown-estimate-of-ebola-outbreak-draws-criticism/">downside</a> to understating things than overstating them currently. <strong>Update</strong>: I’ve revised the multiplier down to 4, mainly because the recent omicron wave was giving implausible probabilities of 100%+ in some places.</p>
<div class="outer">
<div class="inner">
<iframe id="vis3" src="/vis/covid-casetable.html" style="width: 95vw; height: 725px; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div>
<p>And here’s a chloropleth map to show the counties that aren’t in the top 250 (note that the color scheme is clipped at 15 percent probability to handle outlier counties):</p>
<div class="outer">
<div class="inner">
<iframe id="vis2" src="/vis/countymap.html" style="width: 1150px; height: 750px; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div>
<p>So overall I think these plots do a pretty good job of meeting the criteria I set out initially.
But they could still use a more robust estimate for the probability of infection, which I try to calculate in my <a href="https://pstblog.com/2020/06/12/covid-prob">next post</a>.</p>
<h2 id="references">References</h2>
<p>[1] Our World in Data, Coronavirus coverage. <a href="https://ourworldindata.org/coronavirus">https://ourworldindata.org/coronavirus</a></p>
<p>[2] The Covid Tracking Project. <a href="https://covidtracking.com/">https://covidtracking.com/</a></p>
<p>[3] NYTimes, Coronavirus (Covid-19) Data in the United States. <a href="https://github.com/nytimes/covid-19-data">https://github.com/nytimes/covid-19-data</a></p>
<p>[4] Trevor Bedford. <a href="https://twitter.com/trvrb/status/1249414308355649536">https://twitter.com/trvrb/status/1249414308355649536</a></p>
<p>[5] Coronavirus Infections Much Higher Than Reported Cases in Parts of U.S., Study Shows. NYTimes. <a href="https://www.nytimes.com/2020/07/21/health/coronavirus-infections-us.html">https://www.nytimes.com/2020/07/21/health/coronavirus-infections-us.html</a></p>
<p>[6] Commercial Laboratory Seroprevalence Survey Data, CDC. <a href="https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/commercial-lab-surveys.html">https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/commercial-lab-surveys.html</a></p>
<p><a href="https://pstblog.com/2020/06/10/covid-vis">Visualizing the Covid-19 Pandemic</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on June 10, 2020.</p>
https://pstblog.com/2019/12/06/synthpop2019-12-06T00:00:00+00:002019-12-06T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<!--
Working with a Synthetic Population in PostGIS
Working with Synthetic Populations in PostGIS
Working with Synthetic Population Data in PostGIS
Fun With Synthetic Populations
Postgres, PostGIS, and Synthetic Populations
Synthetic Populations with PostGIS
Fun With Synthetic Populations and PostGIS
Fun with Synthetic Populations and PostGIS
Synthetic Populations with Postgres and PostGIS
-->
<p>Synthetic populations are created to represent demographic and geographic data for a population without revealing the underlying data for any individuals. This is done by using an <a href="https://en.wikipedia.org/wiki/Iterative_proportional_fitting">iterative proportional fitting</a> technique to create fake individuals and households that, when aggregated, match the characteristics of each census tract. I recently came across a synthetic population for the entire US on the <a href="https://fred.publichealth.pitt.edu/syn_pops">Pitt Public Health Dynamics Lab website</a>, so I decided to download and experiment with it:</p>
<figure style="text-align:center">
<a href="/images/synthpop/dotmap.png"><img style="max-height:600px" src="/images/synthpop/dotmap.png" /></a>
<figcaption>This example shows the Milwaukee area households by race. Blue = White, Orange = Black, Green = Other (mostly Hispanic in this case)</figcaption>
</figure>
<p>Synthetic populations are traditionally used for things like epidemic modeling and emergency response planning, but they could also be used for something else: calculating voter demographic and turnout data for state legislative districts. I couldn’t do this in the past because the Census Bureau doesn’t release data subdivided by state legislative districts (as far as I can tell). But now that I have a synthetic population, all I need to do is write an SQL query to get the data I need. In addition, I can ask questions like “Which state legislative district has the most voters under 30 with an income of less than $30,000?”, a level of detail that’s impossible to get from the Census data alone.</p>
<p>Below I show all the steps for getting started with the data with Postgres, PostGIS and Python. At the end, I write an SQL query that finds the voting age population for all the Wisconsin State Assembly districts in 2016. All the code for this post is available on GitHub <a href="https://github.com/psthomas/synthpop">here</a>.</p>
<h2 id="setting-up-the-database">Setting up the Database</h2>
<p>If you don’t have Postgres or Postgis installed this can be done from the command line if you use <a href="https://brew.sh/">brew</a> on a mac:</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>brew update
<span class="nv">$ </span>brew <span class="nb">install </span>postgresql
<span class="nv">$ </span>brew <span class="nb">install </span>postgis
<span class="nv">$ </span>brew services start postgresql</code></pre></figure>
<p>Then use these bash commands to create the database and add the postgis extension:</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>createdb <span class="nt">-U</span> psthomas synthpop
<span class="nv">$ </span>psql <span class="nt">-U</span> psthomas <span class="nt">-d</span> synthpop <span class="nt">-c</span> <span class="s2">"CREATE EXTENSION postgis"</span></code></pre></figure>
<p>Also, set your <a href="https://serverfault.com/questions/110154">user password</a>, and export it into the environment for use later:</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span><span class="nb">export </span><span class="nv">POSTGRESPASS</span><span class="o">=</span><span class="s1">'<your-postgres-pass>'</span></code></pre></figure>
<h2 id="importing-the-geography-data">Importing the Geography Data</h2>
<p>This next step can be done on the commmand line as well with <a href="https://gdal.org/drivers/vector/openfilegdb.html#examples">ogr2ogr</a>, a command line utility for converting between geospatial data types. In this case, I’m converting an ESRI geodatabase (.gdb) into a Postgres database. This could be automated as a shell script, but I found it easiest to just use the command line.</p>
<p>First, find the geodatabase that’s relevant to your analysis on the <a href="https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-geodatabase-file.html">Census Bureau website</a>. Download it, then use <code class="language-plaintext highlighter-rouge">ogrinfo</code> to find the names of the different tables available:</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>ogrinfo <span class="s2">"/Volumes/Misc/projects/snythpop/data/geographies/tlgdb_2019_a_us_legislative.gdb.zip"</span>
INFO: Open of <span class="sb">`</span>/Volumes/Misc/projects/synthpop/data/geographies/tlgdb_2019_a_us_legislative.gdb.zip<span class="s1">'
using driver `OpenFileGDB'</span> successful.
1: Congressional_Districts_116th <span class="o">(</span>Multi Polygon<span class="o">)</span>
2: State_Legislative_Districts_Upper <span class="o">(</span>Multi Polygon<span class="o">)</span>
3: State_Legislative_Districts_Lower <span class="o">(</span>Multi Polygon<span class="o">)</span>
4: Voting_Districts <span class="o">(</span>Multi Polygon<span class="o">)</span></code></pre></figure>
<p>Next, use <a href="https://gdal.org/drivers/vector/openfilegdb.html#examples">ogr2ogr</a> to load the desired geography table into the Postgres database you set up above. In this case I’m loading the shapes for the lower state legislative districts (<code class="language-plaintext highlighter-rouge">State_Legislative_Districts_Lower</code>). Note that I’m using the <code class="language-plaintext highlighter-rouge">POSTGRESPASS</code> environmental variable that we exported earlier for the password.</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>ogr2ogr <span class="nt">-overwrite</span> <span class="nt">-skipfailures</span> <span class="nt">-f</span> <span class="s2">"PostgreSQL"</span> PG:<span class="s2">"host=localhost user=psthomas dbname=synthpop password=</span><span class="nv">$POSTGRESPASS</span><span class="s2">"</span> <span class="s2">"/Volumes/Misc/projects/synthpop/data/geographies/tlgdb_2019_a_us_legislative.gdb.zip"</span> <span class="s2">"State_Legislative_Districts_Lower"</span></code></pre></figure>
<p>And here’s a similar command loading the urban area outlines for the Milwaukee dot plot later:</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>ogr2ogr <span class="nt">-overwrite</span> <span class="nt">-skipfailures</span> <span class="nt">-f</span> <span class="s2">"PostgreSQL"</span> PG:<span class="s2">"host=localhost user=psthomas dbname=synthpop password=</span><span class="nv">$POSTGRESPASS</span><span class="s2">"</span> <span class="s2">"/Volumes/Misc/projects/synthpop/data/geographies/tlgdb_2019_a_us_nationgeo.gdb.zip"</span> <span class="s2">"Urban_Area"</span></code></pre></figure>
<p>Note that I’m also loading the data directly from the zipped files because the files were unreadable if unzipped on my computer for some reason. Repeat these steps until you have all the geography tables you need for your analysis.</p>
<h2 id="importing-the-synthetic-population">Importing the Synthetic Population</h2>
<p>The next step is to download and import the synthetic population data. I found the synthetic population data on the <a href="https://fred.publichealth.pitt.edu/syn_pops">Pitt Public Health Dynamics Lab website</a>, with the population originally developed by <a href="https://www.rti.org/impact/rti-us-synthetic-household-population%E2%84%A2">RTI International</a> as a part of the <a href="https://www.nigms.nih.gov/Research/specificareas/MIDAS/">Models of Infectious Disease Agency Study</a> (MIDAS). I manually downloaded the data for each state, extracted the files, and then used the code below to create tables and load the data. Note that the files are vey large (>30GB extracted), so you might want to put the data on an external drive.</p>
<p>From here on out, all the code is written in Python, with the SQL queries handled by psycopg2. You can see all the code for this post available in a Jupyter Notebook <a href="https://github.com/psthomas/synthpop">here</a>. First we need to import all the project dependencies, which include Pandas, Numpy, and GeoPandas.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">glob</span>
<span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">geopandas</span> <span class="k">as</span> <span class="n">gpd</span>
<span class="kn">import</span> <span class="nn">psycopg2</span>
<span class="kn">from</span> <span class="nn">IPython.display</span> <span class="kn">import</span> <span class="n">display</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">matplotlib</span></code></pre></figure>
<p>Next, we make a connection with the database we created earlier using <code class="language-plaintext highlighter-rouge">psycopg2</code>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">pw</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">'POSTGRESPASS'</span><span class="p">]</span> <span class="c1"># Password from current environment
</span><span class="n">connection</span> <span class="o">=</span> <span class="n">psycopg2</span><span class="p">.</span><span class="n">connect</span><span class="p">(</span><span class="n">database</span><span class="o">=</span><span class="s">"synthpop"</span><span class="p">,</span> <span class="n">user</span><span class="o">=</span><span class="s">"psthomas"</span><span class="p">,</span> <span class="n">password</span><span class="o">=</span><span class="n">pw</span><span class="p">)</span>
<span class="n">cursor</span> <span class="o">=</span> <span class="n">connection</span><span class="p">.</span><span class="n">cursor</span><span class="p">()</span></code></pre></figure>
<p>After connecting to the database, we need to create create the tables along with their schemas. This is handled below:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'''
DROP TABLE IF EXISTS households;
'''</span><span class="p">)</span>
<span class="c1"># Creating the households table
</span><span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'''
CREATE TABLE households (
sp_id VARCHAR,
stcotrbg VARCHAR,
hh_race BIGINT,
hh_income BIGINT,
latitude DOUBLE PRECISION,
longitude DOUBLE PRECISION,
coordinate geometry);
'''</span><span class="p">)</span>
<span class="n">connection</span><span class="p">.</span><span class="n">commit</span><span class="p">()</span>
<span class="c1"># Creating the people table
</span><span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'''
DROP TABLE IF EXISTS people;
'''</span><span class="p">)</span>
<span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'''
CREATE TABLE people (
sp_id VARCHAR,
sp_hh_id VARCHAR,
age BIGINT,
sex VARCHAR,
race BIGINT,
relate BIGINT,
school_id VARCHAR,
work_id VARCHAR);
'''</span><span class="p">)</span>
<span class="n">connection</span><span class="p">.</span><span class="n">commit</span><span class="p">()</span></code></pre></figure>
<p>Now that we have the tables created, they’re ready to accept the data. In this first step, I recursively search through the synthetic population directories for all the <code class="language-plaintext highlighter-rouge">households.txt</code> files, which contain households along with their latitudes and longitudes. I then iterate through these files, copying them into the <code class="language-plaintext highlighter-rouge">households</code> table. Note that I’m only loading the data for Wisconsin for now – loading the rest of the data takes up all the disk space on my little old Macbook Air, so I need to move to the cloud if I want query the entire population.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">filepath</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">getcwd</span><span class="p">(),</span> <span class="s">'data'</span><span class="p">,</span> <span class="s">'synthpops'</span><span class="p">,</span> <span class="s">'WI'</span><span class="p">)</span>
<span class="c1"># Delete any existing data first
</span><span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">"DELETE FROM households"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">Path</span><span class="p">(</span><span class="n">filepath</span><span class="p">).</span><span class="n">rglob</span><span class="p">(</span><span class="s">'households.txt'</span><span class="p">):</span>
<span class="n">sql</span> <span class="o">=</span> <span class="s">'''
COPY households(sp_id, stcotrbg, hh_race, hh_income, latitude, longitude)
FROM '{0}' DELIMITER ' ' CSV HEADER;
'''</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="n">sql</span><span class="p">)</span>
<span class="c1"># Commit the results
</span><span class="n">connection</span><span class="p">.</span><span class="n">commit</span><span class="p">()</span></code></pre></figure>
<p>The next step is to convert the latitude/longitude columns into a Point geometry in PostGIS. The documentation on the synthetic population says it was created using the World Geodetic System of 1984 (srid = 4326), so that’s what I specify when I create the points.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'''
UPDATE households
SET coordinate = ST_GeomFromText('POINT(' || longitude || ' ' || latitude || ')', 4326);
'''</span><span class="p">)</span></code></pre></figure>
<p>There’s a problem with these points though: this coordinate system is different from the one that the Census Bureau uses (srid = 4269), so all these points need to be transformed in order to do spatial queries on them. This is done below:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'''
UPDATE households
SET coordinate = ST_Transform(coordinate, 4269);
'''</span><span class="p">)</span>
<span class="c1"># Commit the results
</span><span class="n">connection</span><span class="p">.</span><span class="n">commit</span><span class="p">()</span></code></pre></figure>
<p>Ok, so now we have all of our households geocoded and transformed into the right coordinate system. Next, we need to load all of the people, which can be connected to the households via the <code class="language-plaintext highlighter-rouge">sp_hh_id</code> column. Luckily, there are no geography columns to worry about here, so this step is easier:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># Delete existing data first
</span><span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">"DELETE FROM people"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">Path</span><span class="p">(</span><span class="n">filepath</span><span class="p">).</span><span class="n">rglob</span><span class="p">(</span><span class="s">'people.txt'</span><span class="p">):</span>
<span class="n">sql</span> <span class="o">=</span> <span class="s">'''
COPY people(sp_id, sp_hh_id, age, sex, race, relate, school_id, work_id)
FROM '{}' DELIMITER ' ' CSV HEADER;
'''</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="n">sql</span><span class="p">)</span>
<span class="c1"># Commit the results
</span><span class="n">connection</span><span class="p">.</span><span class="n">commit</span><span class="p">()</span></code></pre></figure>
<p>Great, now we have all of our people and household data loaded. The final thing to do is to add indexes to a few columns to speed up the joins. This is especially important for spatial data because it uses <a href="https://postgis.net/workshops/postgis-intro/indexing.html">bounding boxes</a> to greatly limit the number of comparisons that need to be made during spatial joins.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># Create an index on coordinates.
# Still useful for coordinates because it uses RTree data structure
# https://gis.stackexchange.com/questions/265966
# https://postgis.net/workshops/postgis-intro/indexing.html
</span><span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'''
CREATE INDEX house_geometry_index
ON households
USING GIST(coordinate);
'''</span><span class="p">)</span>
<span class="c1"># Create an index on sp_id for faster joins
# https://www.postgresql.org/docs/11/indexes-intro.html
</span><span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'''
CREATE INDEX house_id_index
ON households (sp_id);
'''</span><span class="p">)</span>
<span class="c1"># Create an index on sp_hh_id for faster join onto households sp_id
# https://www.postgresql.org/docs/11/indexes-intro.html
</span><span class="n">cursor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'''
CREATE INDEX person_house_id_index
ON people (sp_hh_id);
'''</span><span class="p">)</span></code></pre></figure>
<p>All of the Census boundaries that we loaded earlier with <code class="language-plaintext highlighter-rouge">ogr2ogr</code> already have spatial indexes, so we don’t need to worry about adding those. Alright, we finally have all of the data loaded and prepared for querying!</p>
<p>Here is a list of the tables we now have available in our <code class="language-plaintext highlighter-rouge">synthpop</code> database:</p>
<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">synthpop</span><span class="o">=</span><span class="c"># \dt</span>
List of relations
Schema | Name | Type | Owner
<span class="nt">--------</span>+-----------------------------------+-------+----------
public | households | table | psthomas
public | people | table | psthomas
public | spatial_ref_sys | table | psthomas
public | state_legislative_districts_lower | table | psthomas
public | urban_area | table | psthomas
</code></pre></figure>
<h2 id="a-dotmap-of-milwaukee">A Dotmap of Milwaukee</h2>
<p>To start out with a simple query, I thought it would be interesting to create a dotmap of my hometown of Milwaukee. The <code class="language-plaintext highlighter-rouge">households</code> table has a column for race, so we can use that to get a good idea of the demographics of different parts of the city.</p>
<p>The query below is a fairly standard select statement but with a twist – the <code class="language-plaintext highlighter-rouge">ST_Contains(u.shape, h.coordinate)</code> statement. This functionality is added by PostGIS, and it allows you to select the geometries <a href="https://postgis.net/docs/ST_Contains.html">fully contained</a> within another geometry (in this case, the households within the Milwaukee urban area). I then randomly select 10,000 households using the <code class="language-plaintext highlighter-rouge">ORDER BY random() LIMIT 10000</code> clause. Getting this data into Python is then as simple as passing this query and the connection to the GeoPandas <code class="language-plaintext highlighter-rouge">from_postgis</code> function. Pretty nice!</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">dot_query</span> <span class="o">=</span> <span class="s">'''
SELECT
u.geoid,
h.sp_id,
h.coordinate,
h.hh_race
FROM households AS h
JOIN urban_area AS u
ON ST_Contains(u.shape, h.coordinate)
WHERE u.geoid = '57466'
ORDER BY random()
LIMIT 10000;
'''</span>
<span class="n">dot_df</span> <span class="o">=</span> <span class="n">gpd</span><span class="p">.</span><span class="n">GeoDataFrame</span><span class="p">.</span><span class="n">from_postgis</span><span class="p">(</span><span class="n">dot_query</span><span class="p">,</span> <span class="n">connection</span><span class="p">,</span> <span class="n">geom_col</span><span class="o">=</span><span class="s">'coordinate'</span><span class="p">)</span>
<span class="n">display</span><span class="p">(</span><span class="n">dot_df</span><span class="p">.</span><span class="n">head</span><span class="p">())</span></code></pre></figure>
<table>
<thead>
<tr>
<th></th>
<th>geoid</th>
<th>sp_id</th>
<th>coordinate</th>
<th>hh_race</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>57466</td>
<td>54323919</td>
<td>POINT (-88.06619 43.05997)</td>
<td>9</td>
</tr>
<tr>
<th>1</th>
<td>57466</td>
<td>55191668</td>
<td>POINT (-88.21111 42.98681)</td>
<td>1</td>
</tr>
<tr>
<th>2</th>
<td>57466</td>
<td>53797371</td>
<td>POINT (-87.99648 43.02619)</td>
<td>1</td>
</tr>
<tr>
<th>3</th>
<td>57466</td>
<td>53779988</td>
<td>POINT (-87.98308 43.07194)</td>
<td>2</td>
</tr>
<tr>
<th>4</th>
<td>57466</td>
<td>55637265</td>
<td>POINT (-88.18592 43.20900)</td>
<td>1</td>
</tr>
</tbody>
</table>
<p>Next, I need to map this. GeoPandas has some great built-in mapping functionality, so let’s use that.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># http://geopandas.org/mapping.html
</span><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">12</span><span class="p">))</span>
<span class="n">dot_df</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="s">'hh_race'</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">markersize</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s">'tab10'</span><span class="p">,</span>
<span class="n">legend_kwds</span><span class="o">=</span><span class="p">{</span><span class="s">'label'</span><span class="p">:</span> <span class="s">"Household by Race, City of Milwaukee"</span><span class="p">,</span>
<span class="s">'orientation'</span><span class="p">:</span> <span class="s">'horizontal'</span><span class="p">,</span><span class="s">'pad'</span><span class="p">:</span><span class="mf">0.01</span><span class="p">,</span><span class="s">'shrink'</span><span class="p">:</span><span class="mf">0.3</span><span class="p">})</span>
<span class="n">ax</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span></code></pre></figure>
<figure style="text-align:center">
<a href="/images/synthpop/dotmap.png"><img style="max-height:600px" src="/images/synthpop/dotmap.png" /></a>
</figure>
<p>And there we go, we can clearly see the racial demographics of the Milwaukee area. The blue dots are White households, the orange dots are Black households, and green dots are Other (mostly Hispanic on the south side of Milwaukee in this case). Milwaukee is one of the most segregated cities in the country so, although regrettable, these patterns show up pretty clearly.</p>
<h2 id="2016-turnout-wisconsin-state-assembly">2016 Turnout, Wisconsin State Assembly</h2>
<p>I mentioned at the beginning that I wanted to calculate the voter turnout for the 2016 Wisconsin State Assembly districts, so let’s do that next. As far as I can tell, the Census Bureau doesn’t release their data broken down by state legislative districts, so having a synthetic population makes this previously-impossible analysis possible.</p>
<p>The query below is more complex, but I tried to simplify it by breaking it up into <a href="https://www.postgresql.org/docs/current/queries-with.html">common table expressions</a> (CTE). The first CTE uses the state legislative outlines and the <code class="language-plaintext highlighter-rouge">ST_Contains</code> function to find the district for each household. The second CTE joins the <code class="language-plaintext highlighter-rouge">people</code> table onto the <code class="language-plaintext highlighter-rouge">households</code> table and then groups and sums the people (age 18+) by <code class="language-plaintext highlighter-rouge">geoid</code>, making a voting age population column. Finally, this data is joined back onto the legislative outlines table and returned to GeoPandas for plotting purposes. The end result is a count of the voting age population for each district derived entirely from a synthetic population.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">query</span> <span class="o">=</span> <span class="s">'''
WITH lower_households AS (
SELECT
s.geoid,
h.sp_id,
h.coordinate
FROM households AS h
JOIN state_legislative_districts_lower AS s
ON ST_Contains(s.shape, h.coordinate)
WHERE s.geoid LIKE '55%'
),
wisclower_vap AS (
SELECT
geoid,
count(p.sp_id) AS voting_age_pop
FROM people AS p
JOIN lower_households AS l
ON l.sp_id = p.sp_hh_id
WHERE p.age >= 18
GROUP BY l.geoid
ORDER BY voting_age_pop DESC)
SELECT
w.geoid,
w.voting_age_pop,
s.shape
FROM wisclower_vap AS w
JOIN state_legislative_districts_lower as s
ON w.geoid = s.geoid
LIMIT 1000;
'''</span>
<span class="n">poly_df</span> <span class="o">=</span> <span class="n">gpd</span><span class="p">.</span><span class="n">GeoDataFrame</span><span class="p">.</span><span class="n">from_postgis</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">connection</span><span class="p">,</span> <span class="n">geom_col</span><span class="o">=</span><span class="s">'shape'</span><span class="p">)</span>
<span class="n">poly_df</span><span class="p">.</span><span class="n">head</span><span class="p">()</span></code></pre></figure>
<table>
<thead>
<tr>
<th></th>
<th>geoid</th>
<th>voting_age_pop</th>
<th>shape</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>55001</td>
<td>44343</td>
<td>MULTIPOLYGON (((-87.94429 44.67813, -87.93753 ...</td>
</tr>
<tr>
<th>1</th>
<td>55002</td>
<td>41595</td>
<td>MULTIPOLYGON (((-88.12635 44.46920, -88.12572 ...</td>
</tr>
<tr>
<th>2</th>
<td>55003</td>
<td>42106</td>
<td>MULTIPOLYGON (((-88.16546 44.13169, -88.16546 ...</td>
</tr>
<tr>
<th>3</th>
<td>55004</td>
<td>42937</td>
<td>MULTIPOLYGON (((-88.14988 44.50133, -88.14845 ...</td>
</tr>
<tr>
<th>4</th>
<td>55005</td>
<td>41751</td>
<td>MULTIPOLYGON (((-88.19132 44.33248, -88.19131 ...</td>
</tr>
</tbody>
</table>
<p>Note that this computation takes about ten minutes on my old Macbook Air (1.6GHz, 4GB Ram), so I would need to use a cloud server to perform the computation for state legislatures across the entire nation. The next step is to merge this data with Wisconsin Assembly election results from a <a href="https://pstblog.com/2019/03/05/voting-power-comprehensive">previous analysis</a> to calculate turnout and margins:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># Elections Data from a previous analysis: https://pstblog.com/2019/03/05/voting-power-comprehensive
</span><span class="n">results_path</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">getcwd</span><span class="p">(),</span> <span class="s">'data'</span><span class="p">,</span> <span class="s">'election_results'</span><span class="p">,</span> <span class="s">'state_house.csv'</span><span class="p">)</span>
<span class="n">results_df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">results_path</span><span class="p">)</span>
<span class="n">final_df</span> <span class="o">=</span> <span class="n">poly_df</span><span class="p">.</span><span class="n">merge</span><span class="p">(</span><span class="n">results_df</span><span class="p">[[</span><span class="s">'geoid'</span><span class="p">,</span><span class="s">'totalvote'</span><span class="p">,</span> <span class="s">'dem_margin'</span><span class="p">]],</span> <span class="n">on</span><span class="o">=</span><span class="s">'geoid'</span><span class="p">)</span>
<span class="n">final_df</span><span class="p">[</span><span class="s">'turnout'</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">final_df</span><span class="p">[</span><span class="s">'totalvote'</span><span class="p">]</span><span class="o">/</span><span class="n">final_df</span><span class="p">[</span><span class="s">'voting_age_pop'</span><span class="p">])</span><span class="o">*</span><span class="mi">100</span>
<span class="n">final_df</span><span class="p">[</span><span class="s">'rep_margin'</span><span class="p">]</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="o">*</span><span class="n">final_df</span><span class="p">.</span><span class="n">dem_margin</span>
<span class="n">final_df</span><span class="p">[[</span><span class="s">'geoid'</span><span class="p">,</span> <span class="s">'shape'</span><span class="p">,</span> <span class="s">'voting_age_pop'</span><span class="p">,</span> <span class="s">'totalvote'</span><span class="p">,</span> <span class="s">'rep_margin'</span><span class="p">,</span> <span class="s">'turnout'</span><span class="p">]]</span> \
<span class="p">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="s">'turnout'</span><span class="p">,</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">).</span><span class="n">head</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span></code></pre></figure>
<table>
<thead>
<tr>
<th></th>
<th>geoid</th>
<th>shape</th>
<th>voting_age_pop</th>
<th>totalvote</th>
<th>rep_margin</th>
<th>turnout</th>
</tr>
</thead>
<tbody>
<tr>
<th>73</th>
<td>55076</td>
<td>MULTIPOLYGON (((-89.42083 43.06248, -89.42054 ...</td>
<td>45667</td>
<td>40505.0</td>
<td>-66.043698</td>
<td>88.696433</td>
</tr>
<tr>
<th>11</th>
<td>55014</td>
<td>MULTIPOLYGON (((-88.18598 43.08239, -88.18597 ...</td>
<td>43184</td>
<td>34935.0</td>
<td>14.504079</td>
<td>80.898018</td>
</tr>
<tr>
<th>76</th>
<td>55079</td>
<td>MULTIPOLYGON (((-89.26300 43.10712, -89.26200 ...</td>
<td>46805</td>
<td>36316.0</td>
<td>-27.827955</td>
<td>77.590001</td>
</tr>
<tr>
<th>53</th>
<td>55056</td>
<td>MULTIPOLYGON (((-88.88675 44.24121, -88.88674 ...</td>
<td>42228</td>
<td>32573.0</td>
<td>29.076229</td>
<td>77.136023</td>
</tr>
<tr>
<th>35</th>
<td>55038</td>
<td>MULTIPOLYGON (((-88.80975 43.02505, -88.80962 ...</td>
<td>42995</td>
<td>32996.0</td>
<td>25.518245</td>
<td>76.743807</td>
</tr>
</tbody>
</table>
<p>State Assembly <a href="https://ballotpedia.org/Wisconsin_State_Assembly_District_76">District 76</a> (geoid 55076) has the highest turnout (88.7%), which makes sense because it includes the state capitol building in Madison. I guess people are pretty politically engaged around those parts. The final step is to create a few visualizations of the results using the GeoPandas built in mapping features again. Here’s the margin by district:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">plot_df</span> <span class="o">=</span> <span class="n">final_df</span><span class="p">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">plot_df</span><span class="p">[</span><span class="s">'rep_margin'</span><span class="p">]</span> <span class="o">=</span> <span class="n">plot_df</span><span class="p">[</span><span class="s">'rep_margin'</span><span class="p">].</span><span class="n">clip</span><span class="p">(</span><span class="n">lower</span><span class="o">=-</span><span class="mf">25.0</span><span class="p">,</span> <span class="n">upper</span><span class="o">=</span><span class="mf">25.0</span><span class="p">)</span>
<span class="c1">#Change coordinate system for a more legible map
#https://spatialreference.org/ref/epsg/2288/
</span><span class="n">plot_df</span> <span class="o">=</span> <span class="n">plot_df</span><span class="p">.</span><span class="n">to_crs</span><span class="p">({</span><span class="s">'init'</span><span class="p">:</span> <span class="s">'EPSG:2288'</span><span class="p">})</span>
<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">11</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">)</span>
<span class="n">plot_df</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="s">'rep_margin'</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s">'coolwarm'</span><span class="p">,</span><span class="n">legend</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">legend_kwds</span><span class="o">=</span><span class="p">{</span><span class="s">'label'</span><span class="p">:</span> <span class="s">"Republican Margin, 2016 Assembly"</span><span class="p">,</span>
<span class="s">'orientation'</span><span class="p">:</span> <span class="s">'horizontal'</span><span class="p">,</span><span class="s">'pad'</span><span class="p">:</span><span class="mf">0.01</span><span class="p">,</span> <span class="s">'shrink'</span><span class="p">:</span><span class="mf">0.3</span><span class="p">})</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span></code></pre></figure>
<figure style="text-align:center">
<a href="/images/synthpop/margin.png"><img style="max-height:800px" src="/images/synthpop/margin.png" /></a>
</figure>
<p>And here’s the turnout by district:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span><span class="mi">11</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">)</span>
<span class="n">plot_df</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="s">'turnout'</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s">'OrRd'</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">legend_kwds</span><span class="o">=</span><span class="p">{</span><span class="s">'label'</span><span class="p">:</span> <span class="s">"Voter Turnout, 2016 Assembly"</span><span class="p">,</span> <span class="s">'orientation'</span><span class="p">:</span> <span class="s">'horizontal'</span><span class="p">,</span>
<span class="s">'pad'</span><span class="p">:</span><span class="mf">0.01</span><span class="p">,</span> <span class="s">'shrink'</span><span class="p">:</span><span class="mf">0.3</span><span class="p">})</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span></code></pre></figure>
<figure style="text-align:center">
<a href="/images/synthpop/turnout.png"><img style="max-height:800px" src="/images/synthpop/turnout.png" /></a>
</figure>
<p>How do these numbers compare to the actual outcomes? Well, I can’t find district turnout results because nobody has the voting age population (VAP) data, but I can look at the cumulative results to get an idea. According to the assembly <a href="https://en.wikipedia.org/wiki/2016_Wisconsin_State_Assembly_election#District">results</a>, <code class="language-plaintext highlighter-rouge">2,587,171</code> people voted out of a VAP of <code class="language-plaintext highlighter-rouge">4,461,068</code>, giving a turnout rate of <code class="language-plaintext highlighter-rouge">57.9%</code>. If I sum up my data, I get a vote total of <code class="language-plaintext highlighter-rouge">2,568,160</code> people out of a VAP of <code class="language-plaintext highlighter-rouge">4,155,055</code>, giving a turnout of <code class="language-plaintext highlighter-rouge">61.8%</code>.</p>
<p>So my approach overstates turnout a little bit because it underestimates the VAP. This makes sense because the synthetic population was created using 2007-2011 Census data and doesn’t take into account population growth since then. But overall I’m pretty happy with these results!</p>
<h2 id="conclusion">Conclusion</h2>
<p>The purpose of this post was to provide a small tutorial on loading and querying synthetic population data in Postgres. I’ve only scratched the surface of what is possible with this data, so stay tuned for future posts on this topic. I think it would be interesting to calculate turnout for every state legislative seat in my dataset, but I would need to move beyond my computer into the cloud to do so. Perhaps I could use something like <a href="https://cloud.google.com/bigquery">BigQuery</a> instead.</p>
<p>I really wish the Census Bureau would move towards releasing data in this format. One of the problems with election reporting is that margins often get much more attention than the turnout. This is partly because turnout is very difficult/impossible to calculate for many seats, so having annual synthetic population data available could go a long way towards fixing this problem. This would be especially helpful because, although it’s still useful for other purposes, the synthetic population I used in this post will soon be obsolete for political analysis due to changes in population.</p>
<h2 id="references">References</h2>
<p>[1] Iterative Proportional Fitting. Wikipedia. <a href="https://en.wikipedia.org/wiki/Iterative_proportional_fitting">https://en.wikipedia.org/wiki/Iterative_proportional_fitting</a></p>
<p>[2] TIGER/Line Geodatabases. US Census Bureau. <a href="https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-geodatabase-file.html">https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-geodatabase-file.html</a></p>
<p>[3] A Framework for Reconstructing Epidemiological Dynamics Page. Pitt Public Health Dynamics Lab. <a href="https://fred.publichealth.pitt.edu/syn_pops">https://fred.publichealth.pitt.edu/syn_pops</a></p>
<p>[4] RTI U.S. Synthetic Household Population. RTI International. <a href="https://www.rti.org/impact/rti-us-synthetic-household-population%E2%84%A2">https://www.rti.org/impact/rti-us-synthetic-household-population%E2%84%A2</a></p>
<p>[5] Models of Infectious Disease Agent Study (MIDAS). National Institute of General Medical Sciences. <a href="https://www.nigms.nih.gov/Research/specificareas/MIDAS/">https://www.nigms.nih.gov/Research/specificareas/MIDAS/</a></p>
<p>[6] Where do voters have the most political influence? Me. <a href="https://pstblog.com/2019/03/05/voting-power-comprehensive">https://pstblog.com/2019/03/05/voting-power-comprehensive</a></p>
<p>[7] 2016 Wisconsin State Assembly Elections. Wikipedia. <a href="https://en.wikipedia.org/wiki/2016_Wisconsin_State_Assembly_election#District">https://en.wikipedia.org/wiki/2016_Wisconsin_State_Assembly_election#District</a></p>
<p>[8] Querying PostgreSQL / PostGIS Databases in Python. Andrew Gaidus. <a href="http://andrewgaidus.com/Build_Query_Spatial_Database/">http://andrewgaidus.com/Build_Query_Spatial_Database/</a></p>
<p><a href="https://pstblog.com/2019/12/06/synthpop">Working with Synthetic Populations in PostGIS</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on December 06, 2019.</p>
https://pstblog.com/2019/10/11/onevis2020-11-10T00:00:00-00:002019-10-11T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<!-- <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega@5"></script>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-lite@3.4.0"></script>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-embed@4"></script> -->
<p>I’m fascinated with creating a single visualization that can fully communicate the results of an election. I’ve made a few <a href="https://pstblog.com/2017/06/05/national-election-vis">attempts</a> in the past and even came up with a set of <a href="https://pstblog.com/2016/12/08/presidential-election">criteria</a> for the ideal visualization. More or less, I want to answer the questions “What happened?”, “What changed?”, and “Why?” in a single visualization. Of course, nobody can fully answer the “Why?” question, but you can point towards an answer by looking at demographic and survey data.</p>
<p>So here’s my most recent attempt at an all-encompassing visualization for the 2016 election:</p>
<iframe id="vis" src="/vis/onevis-2016.html" style="width: 100%; height:650px; border: none; position: relative; scrolling:no;">
</iframe>
<p>This might look like a little bit like an abstract painting, but I promise it’s informative. Each color category represents the entire electorate broken up into subgroups by age, housing density, education, gender, race, and state. The size of the bubble represents the population of the group, and the coordinate position represents the margin and turnout. The tail of each bubble represents the change from the 2012 presidential election.</p>
<p>If you want to find out what changed this election, just find the large bubbles with the long tails. And to find the crucial states in the electoral college, find the bubbles that just crossed over the y-axis. <strong>Note</strong>: this visualization is interactive so you can zoom in for more detail, and filter categories using the legend.</p>
<p>So the older age groups, males, high school graduates, and very low density areas all shifted rightwards in 2016. At the same time, turnout dropped substantially among the Non-Hispanic Black populations. These changes were enough to bring crucial states like WI, PA and MI just across the threshold into Republican territory. No new insights here, but I think it’s interesting to have a single graphic that communicates it all.</p>
<h2 id="a-few-more-election-visualizations-to-rule-them-all">A Few More Election Visualizations to Rule Them All</h2>
<p>One nice thing about the Catalist <a href="https://medium.com/@yghitza_48326/revisiting-what-happened-in-the-2018-election-c532feb51c0#_ftn1">dataset</a> is that it has results for every 2 years, so we can look at midterm elections too. So the visualizations below just combines the Catalist margins data with Elections Project <a href="http://www.electproject.org/home/voter-turnout/demographics">turnout data</a> for each demographic.</p>
<p>For state level results in non-presidential years, I aggregated the MIT Elections Lab <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IG0UN2">house election results</a> and combined these results with the Election Project <a href="http://www.electproject.org/home/voter-turnout/voter-turnout-data">state turnout data</a>. The bubble tails are staggered by 4 years so that presidential and house election results are compared against each other.</p>
<!--https://stackoverflow.com/questions/5867985-->
<div class="outer" style="padding-left:25px">
<div class="inner">
<!--style="width: 98vw; height: 100vh; border: none; position: relative; right:-50%;-->
<iframe id="vis2" src="/vis/onevis-catalist.html" style="width: 1330px; height: 1475px; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div>
<p>There’s a lot to take in here, but I think the most striking thing is the increase in turnout and leftward shift in 2018. Blue wave indeed. This change seems largely to be driven by college educated voters. Another crucial thing to note here is that in 2018 the house vote crossed into Democratic territory in both Wisconsin (Rep -7.5%) and Arizona (Rep -1.71%). These will be crucial swing states in 2020 for both the presidency and senate, so things could get interesting.</p>
<p><strong>Update</strong>: Preliminary 2020 results have been added, based on aggregate county level results and the Elections Project state turnout data. The big story of 2020 so far seems to be turnout – specifially an increase in turnout (and a slight shift left) in low and medium density places that was just enough to outweigh the turnout increases in right leaning, very low density places. It’s also interesting to see Republican margins improve in high density places, although the county data is still preliminary so this result could change.</p>
<p>One interesting question is whether the margin changes in 2020 were driven by turnout or vote switching. Only time (and better sources of data) will tell. Catalist <a href="https://twitter.com/Catalist_US/status/1325905764691505152">won’t be releasing</a> their demographic data until January of 2021, so I’ll add that when it becomes available.</p>
<h2 id="data-issues">Data Issues</h2>
<p>The real challenge with these plots is getting the data, especially the demographic data. The source of the demographic turnout data is the Census Current Population Survey, via the Election Projects dataset [4, 5]. The demographic margin data is courtesy of the Catalist <a href="https://medium.com/@yghitza_48326/revisiting-what-happened-in-the-2018-election-c532feb51c0#_ftn1">dataset</a>. The margin and turnout data for each housing density and state are derived from county and state level election results, along with Census Bureau county housing density data and Citylab density categories.</p>
<p>The Elections Project reports turnout as a percent of voting eligible population, while my density estimates use percent of citizen voting age population. So this does lead to some differences in estimated turnout:</p>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table class="dataframe">
<thead>
<tr>
<th>category</th>
<th>estimated_turnout</th>
</tr>
</thead>
<tbody>
<tr>
<th>age</th>
<td>62.28</td>
</tr>
<tr>
<th>density</th>
<td>59.44</td>
</tr>
<tr>
<th>education</th>
<td>64.19</td>
</tr>
<tr>
<th>race</th>
<td>61.39</td>
</tr>
<tr>
<th>sex</th>
<td>60.79</td>
</tr>
<tr>
<th>state</th>
<td>60.94</td>
</tr>
</tbody>
</table>
</div>
<p><a href="https://en.wikipedia.org/wiki/United_States_presidential_election,_2016#Statistical_analysis">Overall turnout</a> was <code class="language-plaintext highlighter-rouge">55.4%</code> of voting age population and <code class="language-plaintext highlighter-rouge">60.2%</code> voting eligible population in 2016. So these numbers are in the right ballpark, but obviously they’re not perfect.</p>
<h2 id="sources">Sources</h2>
<p>[1] MIT Election Labs, 2000-2016 County level presidential results: <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ">https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ</a></p>
<p>[2] Citizen Voting Age Population data: <a href="https://www.census.gov/programs-surveys/decennial-census/about/voting-rights/cvap.html">https://www.census.gov/programs-surveys/decennial-census/about/voting-rights/cvap.html</a></p>
<p>[3] United States Election Project, Demographic Turnout Data. <a href="http://www.electproject.org/home/voter-turnout/demographics">http://www.electproject.org/home/voter-turnout/demographics</a></p>
<p>[4] United States Election Project, State Turnout Data. <a href="http://www.electproject.org/home/voter-turnout/voter-turnout-data">http://www.electproject.org/home/voter-turnout/voter-turnout-data</a></p>
<p>[5] American National Election Studies, demographic data. <a href="https://electionstudies.org/data-center/anes-time-series-cumulative-data-file/">https://electionstudies.org/data-center/anes-time-series-cumulative-data-file/</a></p>
<p>[6] Catalist demographic data. <a href="https://docs.google.com/spreadsheets/d/1UwC_GapbE3vF6-n1THVbwcXoU_zFvO8jJQL99ouX3Rw/edit?ts=5beae6d4#gid=433702266">https://docs.google.com/spreadsheets/d/1UwC_GapbE3vF6-n1THVbwcXoU_zFvO8jJQL99ouX3Rw/edit?ts=5beae6d4#gid=433702266</a></p>
<p>[7] Revisiting What Happened in the 2018 Election. Yair Ghitza. <a href="https://medium.com/@yghitza_48326/revisiting-what-happened-in-the-2018-election-c532feb51c0">https://medium.com/@yghitza_48326/revisiting-what-happened-in-the-2018-election-c532feb51c0#_ftn1</a></p>
<p>[8] U.S. House 1976–2018. MIT MIT Election Data and Science Lab. <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IG0UN2">https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IG0UN2</a></p>
<p>[9] CityLab Congressional Density Index, Methodology. <a href="https://github.com/theatlantic/citylab-data/blob/master/citylab-congress/methodology.md">https://github.com/theatlantic/citylab-data/blob/master/citylab-congress/methodology.md</a></p>
<p>[10] Code source for this post: psthomas/onevis: <a href="https://github.com/psthomas/onevis">https://github.com/psthomas/onevis</a></p>
<p><a href="https://pstblog.com/2019/10/11/onevis">One Election Visualization to Rule Them All</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on October 11, 2019.</p>
https://pstblog.com/2019/10/10/voting-power-20202019-10-10T00:00:00+00:002019-10-10T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<!-- <script type="text/javascript" src="https://code.jquery.com/jquery-3.3.1.js"></script>
<script type="text/javascript" src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js"></script> -->
<p>In a <a href="https://pstblog.com/2019/03/05/voting-power-comprehensive">previous post</a>, I calculated the voting power index for every state and federal election over the last election cycle. But not all of these seats will be contested next year, so I thought it would be interesting to select out just the 2020 elections for a new analysis. As a reminder, the voting power values are calculated using this equation:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">voting_power_index</span> <span class="o">=</span> <span class="n">seat_potential_power</span><span class="o">/</span><span class="n">percent_absolute_margin</span></code></pre></figure>
<p>This calculation is explained more in the previous post, but the main point is that it allows me to combine both the importance and margin of an election into a single metric. These values can then be aggregated and compared across different states. All the code for this project is available on GitHub <a href="https://github.com/psthomas/voting-power-comprehensive">here</a>.</p>
<p>So, without further ado, here are the results for 2020:</p>
<figure style="text-align:center">
<a href="/images/votepower-2020/map.png"><img style="max-height:800px" src="/images/votepower-2020/map.png" /></a>
</figure>
<p>And here’s a table of the results, note that the table is sorted by the 2020 voting power value.</p>
<!--style="width:auto;" style="height:700px;overflow-y:scroll;"
style="height:700px; width:625px; overflow:auto;"-->
<div>
<style type="text/css">
#T_c670a274_0435_11ea_ae88_c869cd9e96darow0_col1 {
background-color: #ffece4;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow0_col2 {
background-color: #fee7db;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow0_col3 {
background-color: #ee3a2c;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow0_col4 {
background-color: #67000d;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow0_col5 {
background-color: #e32f27;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow1_col1 {
background-color: #fee0d2;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow1_col2 {
background-color: #fee3d7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow1_col3 {
background-color: #f6553c;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow1_col4 {
background-color: #ac1117;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow1_col5 {
background-color: #e32f27;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow2_col1 {
background-color: #fedbcc;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow2_col2 {
background-color: #fdcebb;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow2_col3 {
background-color: #fcc1a8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow2_col4 {
background-color: #fc9474;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow2_col5 {
background-color: #d52221;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow3_col1 {
background-color: #fdd4c2;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow3_col2 {
background-color: #fdcab5;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow3_col3 {
background-color: #fcbea5;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow3_col4 {
background-color: #fc997a;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow3_col5 {
background-color: #f03d2d;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow4_col1 {
background-color: #fdc5ae;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow4_col2 {
background-color: #fee8dd;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow4_col3 {
background-color: #67000d;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow4_col4 {
background-color: #fc9c7d;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow4_col5 {
background-color: #fc8161;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow5_col1 {
background-color: #fedfd0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow5_col2 {
background-color: #fee8de;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow5_col3 {
background-color: #fdd0bc;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow5_col4 {
background-color: #fdd0bc;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow5_col5 {
background-color: #f03d2d;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow6_col1 {
background-color: #fdcebb;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow6_col2 {
background-color: #ffece4;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow6_col3 {
background-color: #fee4d8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow6_col4 {
background-color: #fee0d2;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow6_col5 {
background-color: #e32f27;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow7_col1 {
background-color: #fedaca;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow7_col2 {
background-color: #fedccd;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow7_col3 {
background-color: #fee8de;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow7_col4 {
background-color: #fee2d5;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow7_col5 {
background-color: #d52221;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow8_col1 {
background-color: #fee6da;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow8_col2 {
background-color: #ffede5;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow8_col3 {
background-color: #fed9c9;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow8_col4 {
background-color: #fee5d8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow8_col5 {
background-color: #f96044;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow9_col1 {
background-color: #fc9777;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow9_col2 {
background-color: #fca689;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow9_col3 {
background-color: #fee8de;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow9_col4 {
background-color: #fee5d9;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow9_col5 {
background-color: #f44f39;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow10_col1 {
background-color: #ea362a;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow10_col2 {
background-color: #fc8e6e;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow10_col3 {
background-color: #ffece4;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow10_col4 {
background-color: #fee9df;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow10_col5 {
background-color: #d52221;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow11_col1 {
background-color: #fee7dc;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow11_col2 {
background-color: #fee7dc;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow11_col3 {
background-color: #ffeee7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow11_col4 {
background-color: #ffece3;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow11_col5 {
background-color: #c7171c;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow12_col1 {
background-color: #fff3ed;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow12_col2 {
background-color: #fee3d7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow12_col3 {
background-color: #fff0e9;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow12_col4 {
background-color: #ffede5;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow12_col5 {
background-color: #960b13;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow13_col1 {
background-color: #67000d;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow13_col2 {
background-color: #ffeee6;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow13_col3 {
background-color: #ffeee6;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow13_col4 {
background-color: #ffeee7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow13_col5 {
background-color: #f03d2d;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow14_col1 {
background-color: #fc8a6a;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow14_col2 {
background-color: #fdd5c4;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow14_col3 {
background-color: #fff0e8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow14_col4 {
background-color: #ffeee7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow14_col5 {
background-color: #d52221;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow15_col1 {
background-color: #fdc6b0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow15_col2 {
background-color: #fdd5c4;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow15_col3 {
background-color: #ffeee7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow15_col4 {
background-color: #ffeee7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow15_col5 {
background-color: #f03d2d;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow16_col1 {
background-color: #fdd7c6;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow16_col2 {
background-color: #fff0e8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow16_col3 {
background-color: #feeae1;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow16_col4 {
background-color: #ffeee7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow16_col5 {
background-color: #fca082;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow17_col1 {
background-color: #f4503a;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow17_col2 {
background-color: #fee8dd;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow17_col3 {
background-color: #feeae1;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow17_col4 {
background-color: #ffefe8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow17_col5 {
background-color: #fca082;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow18_col1 {
background-color: #fdd1be;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow18_col2 {
background-color: #fca78b;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow18_col3 {
background-color: #fff1ea;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow18_col4 {
background-color: #ffefe8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow18_col5 {
background-color: #b81419;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow19_col1 {
background-color: #fee8dd;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow19_col2 {
background-color: #ffece4;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow19_col3 {
background-color: #fff1ea;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow19_col4 {
background-color: #fff0e8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow19_col5 {
background-color: #d52221;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow20_col1 {
background-color: #fedccd;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow20_col2 {
background-color: #fee9df;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow20_col3 {
background-color: #fff0e9;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow20_col4 {
background-color: #fff0e8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow20_col5 {
background-color: #f44f39;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow21_col1 {
background-color: #fdd4c2;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow21_col2 {
background-color: #fca082;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow21_col3 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow21_col4 {
background-color: #fff1ea;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow21_col5 {
background-color: #960b13;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow22_col1 {
background-color: #fed9c9;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow22_col2 {
background-color: #fcc4ad;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow22_col3 {
background-color: #fff0e8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow22_col4 {
background-color: #fff2eb;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow22_col5 {
background-color: #fc9070;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow23_col1 {
background-color: #ffefe8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow23_col2 {
background-color: #feeae0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow23_col3 {
background-color: #fff3ed;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow23_col4 {
background-color: #fff2eb;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow23_col5 {
background-color: #67000d;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow24_col1 {
background-color: #ffeee6;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow24_col2 {
background-color: #fee3d7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow24_col3 {
background-color: #fff1ea;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow24_col4 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow24_col5 {
background-color: #fc8161;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow25_col1 {
background-color: #fee5d8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow25_col2 {
background-color: #fcaa8d;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow25_col3 {
background-color: #fff3ed;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow25_col4 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow25_col5 {
background-color: #aa1016;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow26_col1 {
background-color: #fcc1a8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow26_col2 {
background-color: #fee2d5;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow26_col3 {
background-color: #fff4ee;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow26_col4 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow26_col5 {
background-color: #67000d;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow27_col1 {
background-color: #ffece3;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow27_col2 {
background-color: #fc997a;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow27_col3 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow27_col4 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow27_col5 {
background-color: #fb7252;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow28_col1 {
background-color: #fdd3c1;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow28_col2 {
background-color: #ffefe8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow28_col3 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow28_col4 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow28_col5 {
background-color: #fb7252;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow29_col1 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow29_col2 {
background-color: #ffefe8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow29_col3 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow29_col4 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow29_col5 {
background-color: #fb7252;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow30_col1 {
background-color: #fb7858;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow30_col2 {
background-color: #fdc5ae;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow30_col3 {
background-color: #fff2eb;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow30_col4 {
background-color: #fff3ed;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow30_col5 {
background-color: #fcaf93;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow31_col1 {
background-color: #fee9df;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow31_col2 {
background-color: #ffeee7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow31_col3 {
background-color: #fff4ee;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow31_col4 {
background-color: #fff4ee;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow31_col5 {
background-color: #aa1016;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow32_col1 {
background-color: #ffede5;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow32_col2 {
background-color: #feeae1;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow32_col3 {
background-color: #fff3ed;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow32_col4 {
background-color: #fff4ee;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow32_col5 {
background-color: #fc8161;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow33_col1 {
background-color: #fee1d4;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow33_col2 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow33_col3 {
background-color: #fff4ee;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow33_col4 {
background-color: #fff4ee;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow33_col5 {
background-color: #b81419;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow34_col1 {
background-color: #fc8666;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow34_col2 {
background-color: #fc7f5f;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow34_col3 {
background-color: #fff4ee;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow34_col4 {
background-color: #fff4ee;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow34_col5 {
background-color: #b81419;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow35_col1 {
background-color: #fedecf;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow35_col2 {
background-color: #fff3ed;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow35_col3 {
background-color: #fff3ed;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow35_col4 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow35_col5 {
background-color: #fb7252;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow36_col1 {
background-color: #fdd1be;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow36_col2 {
background-color: #ffece3;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow36_col3 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow36_col4 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow36_col5 {
background-color: #b81419;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow37_col1 {
background-color: #fff3ed;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow37_col2 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow37_col3 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow37_col4 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow37_col5 {
background-color: #d52221;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow38_col1 {
background-color: #fee3d6;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow38_col2 {
background-color: #ffece4;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow38_col3 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow38_col4 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow38_col5 {
background-color: #c7171c;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow39_col1 {
background-color: #fcb79c;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow39_col2 {
background-color: #ffece3;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow39_col3 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow39_col4 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow39_col5 {
background-color: #c7171c;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow40_col1 {
background-color: #fee4d8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow40_col2 {
background-color: #fcc2aa;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow40_col3 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow40_col4 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow40_col5 {
background-color: #b81419;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow41_col1 {
background-color: #fff0e9;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow41_col2 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow41_col3 {
background-color: #fff4ee;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow41_col4 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow41_col5 {
background-color: #fca082;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow42_col1 {
background-color: #fc997a;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow42_col2 {
background-color: #fb7252;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow42_col3 {
background-color: #fff3ed;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow42_col4 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow42_col5 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow43_col1 {
background-color: #fee0d2;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow43_col2 {
background-color: #fee5d9;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow43_col3 {
background-color: #fff3ed;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow43_col4 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow43_col5 {
background-color: #fedbcc;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow44_col1 {
background-color: #fed8c7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow44_col2 {
background-color: #fc8f6f;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow44_col3 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow44_col4 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow44_col5 {
background-color: #e32f27;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow45_col1 {
background-color: #fff1ea;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow45_col2 {
background-color: #ffece4;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow45_col3 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow45_col4 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow45_col5 {
background-color: #e32f27;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow46_col1 {
background-color: #fedccd;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow46_col2 {
background-color: #67000d;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow46_col3 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow46_col4 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow46_col5 {
background-color: #e32f27;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow47_col1 {
background-color: #fee3d7;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow47_col2 {
background-color: #fff2ec;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow47_col3 {
background-color: #fff4ef;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow47_col4 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow47_col5 {
background-color: #fc8161;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow48_col1 {
background-color: #fee8de;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow48_col2 {
background-color: #e12d26;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow48_col3 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow48_col4 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow48_col5 {
background-color: #f03d2d;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow49_col1 {
background-color: #e32f27;
color: #f1f1f1;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow49_col2 {
background-color: #ffefe8;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow49_col3 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow49_col4 {
background-color: #fff5f0;
color: #000000;
} #T_c670a274_0435_11ea_ae88_c869cd9e96darow49_col5 {
background-color: #f03d2d;
color: #f1f1f1;
}</style><table id="T_c670a274_0435_11ea_ae88_c869cd9e96da"><thead> <tr> <th class="blank level0"></th> <th class="col_heading level0 col0">state_abbr</th> <th class="col_heading level0 col1">capita_spend</th> <th class="col_heading level0 col2">spend_ratio</th> <th class="col_heading level0 col3">voting_power</th> <th class="col_heading level0 col4">voting_power_2020</th> <th class="col_heading level0 col5">rank_change</th> </tr></thead><tbody>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row0" class="row_heading level0 row0">0</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow0_col0" class="data row0 col0">NC</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow0_col1" class="data row0 col1">14.873</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow0_col2" class="data row0 col2">0.744</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow0_col3" class="data row0 col3">4.105</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow0_col4" class="data row0 col4">4.025</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow0_col5" class="data row0 col5">1</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row1" class="row_heading level0 row1">1</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow1_col0" class="data row1 col0">MI</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow1_col1" class="data row1 col1">21.483</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow1_col2" class="data row1 col2">0.869</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow1_col3" class="data row1 col3">3.661</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow1_col4" class="data row1 col4">3.418</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow1_col5" class="data row1 col5">1</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row2" class="row_heading level0 row2">2</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow2_col0" class="data row2 col0">NH</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow2_col1" class="data row2 col1">23.242</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow2_col2" class="data row2 col2">1.349</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow2_col3" class="data row2 col3">1.524</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow2_col4" class="data row2 col4">1.485</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow2_col5" class="data row2 col5">2</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row3" class="row_heading level0 row3">3</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow3_col0" class="data row3 col0">PA</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow3_col1" class="data row3 col1">25.173</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow3_col2" class="data row3 col2">1.436</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow3_col3" class="data row3 col3">1.575</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow3_col4" class="data row3 col4">1.421</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow3_col5" class="data row3 col5">0</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row4" class="row_heading level0 row4">4</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow4_col0" class="data row4 col0">FL</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow4_col1" class="data row4 col1">29.879</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow4_col2" class="data row4 col2">0.707</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow4_col3" class="data row4 col3">6.552</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow4_col4" class="data row4 col4">1.394</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow4_col5" class="data row4 col5">-4</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row5" class="row_heading level0 row5">5</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow5_col0" class="data row5 col0">WI</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow5_col1" class="data row5 col1">22.068</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow5_col2" class="data row5 col2">0.677</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow5_col3" class="data row5 col3">1.193</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow5_col4" class="data row5 col4">0.738</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow5_col5" class="data row5 col5">0</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row6" class="row_heading level0 row6">6</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow6_col0" class="data row6 col0">TX</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow6_col1" class="data row6 col1">26.893</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow6_col2" class="data row6 col2">0.523</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow6_col3" class="data row6 col3">0.68</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow6_col4" class="data row6 col4">0.511</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow6_col5" class="data row6 col5">1</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row7" class="row_heading level0 row7">7</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow7_col0" class="data row7 col0">MN</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow7_col1" class="data row7 col1">23.256</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow7_col2" class="data row7 col2">1.052</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow7_col3" class="data row7 col3">0.514</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow7_col4" class="data row7 col4">0.464</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow7_col5" class="data row7 col5">2</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row8" class="row_heading level0 row8">8</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow8_col0" class="data row8 col0">GA</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow8_col1" class="data row8 col1">18.276</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow8_col2" class="data row8 col2">0.505</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow8_col3" class="data row8 col3">1.001</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow8_col4" class="data row8 col4">0.416</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow8_col5" class="data row8 col5">-2</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row9" class="row_heading level0 row9">9</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow9_col0" class="data row9 col0">CA</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow9_col1" class="data row9 col1">43.097</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow9_col2" class="data row9 col2">2.15</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow9_col3" class="data row9 col3">0.521</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow9_col4" class="data row9 col4">0.392</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow9_col5" class="data row9 col5">-1</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row10" class="row_heading level0 row10">10</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow10_col0" class="data row10 col0">NY</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow10_col1" class="data row10 col1">68.74</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow10_col2" class="data row10 col2">2.593</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow10_col3" class="data row10 col3">0.365</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow10_col4" class="data row10 col4">0.294</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow10_col5" class="data row10 col5">2</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row11" class="row_heading level0 row11">11</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow11_col0" class="data row11 col0">AZ</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow11_col1" class="data row11 col1">17.845</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow11_col2" class="data row11 col2">0.729</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow11_col3" class="data row11 col3">0.29</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow11_col4" class="data row11 col4">0.242</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow11_col5" class="data row11 col5">3</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row12" class="row_heading level0 row12">12</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow12_col0" class="data row12 col0">UT</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow12_col1" class="data row12 col1">11.393</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow12_col2" class="data row12 col2">0.88</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow12_col3" class="data row12 col3">0.209</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow12_col4" class="data row12 col4">0.202</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow12_col5" class="data row12 col5">6</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row13" class="row_heading level0 row13">13</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow13_col0" class="data row13 col0">NV</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow13_col1" class="data row13 col1">101.142</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow13_col2" class="data row13 col2">0.476</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow13_col3" class="data row13 col3">0.306</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow13_col4" class="data row13 col4">0.173</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow13_col5" class="data row13 col5">0</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row14" class="row_heading level0 row14">14</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow14_col0" class="data row14 col0">IL</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow14_col1" class="data row14 col1">46.466</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow14_col2" class="data row14 col2">1.212</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow14_col3" class="data row14 col3">0.241</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow14_col4" class="data row14 col4">0.167</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow14_col5" class="data row14 col5">2</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row15" class="row_heading level0 row15">15</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow15_col0" class="data row15 col0">CO</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow15_col1" class="data row15 col1">29.497</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow15_col2" class="data row15 col2">1.198</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow15_col3" class="data row15 col3">0.272</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow15_col4" class="data row15 col4">0.167</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow15_col5" class="data row15 col5">0</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row16" class="row_heading level0 row16">16</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow16_col0" class="data row16 col0">OH</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow16_col1" class="data row16 col1">24.516</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow16_col2" class="data row16 col2">0.42</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow16_col3" class="data row16 col3">0.431</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow16_col4" class="data row16 col4">0.166</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow16_col5" class="data row16 col5">-6</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row17" class="row_heading level0 row17">17</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow17_col0" class="data row17 col0">VA</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow17_col1" class="data row17 col1">61.918</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow17_col2" class="data row17 col2">0.705</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow17_col3" class="data row17 col3">0.422</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow17_col4" class="data row17 col4">0.163</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow17_col5" class="data row17 col5">-6</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row18" class="row_heading level0 row18">18</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow18_col0" class="data row18 col0">WA</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow18_col1" class="data row18 col1">26.133</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow18_col2" class="data row18 col2">2.129</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow18_col3" class="data row18 col3">0.174</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow18_col4" class="data row18 col4">0.159</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow18_col5" class="data row18 col5">4</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row19" class="row_heading level0 row19">19</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow19_col0" class="data row19 col0">IN</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow19_col1" class="data row19 col1">17.374</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow19_col2" class="data row19 col2">0.531</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow19_col3" class="data row19 col3">0.182</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow19_col4" class="data row19 col4">0.148</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow19_col5" class="data row19 col5">2</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row20" class="row_heading level0 row20">20</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow20_col0" class="data row20 col0">MO</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow20_col1" class="data row20 col1">22.602</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow20_col2" class="data row20 col2">0.656</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow20_col3" class="data row20 col3">0.207</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow20_col4" class="data row20 col4">0.141</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow20_col5" class="data row20 col5">-1</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row21" class="row_heading level0 row21">21</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow21_col0" class="data row21 col0">ME</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow21_col1" class="data row21 col1">25.096</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow21_col2" class="data row21 col2">2.268</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow21_col3" class="data row21 col3">0.124</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow21_col4" class="data row21 col4">0.107</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow21_col5" class="data row21 col5">6</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row22" class="row_heading level0 row22">22</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow22_col0" class="data row22 col0">NJ</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow22_col1" class="data row22 col1">23.654</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow22_col2" class="data row22 col2">1.571</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow22_col3" class="data row22 col3">0.231</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow22_col4" class="data row22 col4">0.1</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow22_col5" class="data row22 col5">-5</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row23" class="row_heading level0 row23">23</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow23_col0" class="data row23 col0">WV</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow23_col1" class="data row23 col1">13.427</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow23_col2" class="data row23 col2">0.619</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow23_col3" class="data row23 col3">0.102</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow23_col4" class="data row23 col4">0.097</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow23_col5" class="data row23 col5">8</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row24" class="row_heading level0 row24">24</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow24_col0" class="data row24 col0">IA</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow24_col1" class="data row24 col1">14.155</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow24_col2" class="data row24 col2">0.866</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow24_col3" class="data row24 col3">0.186</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow24_col4" class="data row24 col4">0.082</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow24_col5" class="data row24 col5">-4</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row25" class="row_heading level0 row25">25</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow25_col0" class="data row25 col0">NM</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow25_col1" class="data row25 col1">19.158</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow25_col2" class="data row25 col2">2.064</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow25_col3" class="data row25 col3">0.105</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow25_col4" class="data row25 col4">0.082</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow25_col5" class="data row25 col5">5</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row26" class="row_heading level0 row26">26</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow26_col0" class="data row26 col0">MT</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow26_col1" class="data row26 col1">31.36</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow26_col2" class="data row26 col2">0.92</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow26_col3" class="data row26 col3">0.087</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow26_col4" class="data row26 col4">0.08</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow26_col5" class="data row26 col5">8</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row27" class="row_heading level0 row27">27</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow27_col0" class="data row27 col0">OR</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow27_col1" class="data row27 col1">15.198</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow27_col2" class="data row27 col2">2.376</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow27_col3" class="data row27 col3">0.132</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow27_col4" class="data row27 col4">0.075</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow27_col5" class="data row27 col5">-3</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row28" class="row_heading level0 row28">28</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow28_col0" class="data row28 col0">KS</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow28_col1" class="data row28 col1">25.679</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow28_col2" class="data row28 col2">0.431</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow28_col3" class="data row28 col3">0.128</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow28_col4" class="data row28 col4">0.072</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow28_col5" class="data row28 col5">-3</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row29" class="row_heading level0 row29">29</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow29_col0" class="data row29 col0">SC</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow29_col1" class="data row29 col1">10.081</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow29_col2" class="data row29 col2">0.43</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow29_col3" class="data row29 col3">0.127</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow29_col4" class="data row29 col4">0.072</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow29_col5" class="data row29 col5">-3</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row30" class="row_heading level0 row30">30</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow30_col0" class="data row30 col0">CT</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow30_col1" class="data row30 col1">51.359</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow30_col2" class="data row30 col2">1.544</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow30_col3" class="data row30 col3">0.164</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow30_col4" class="data row30 col4">0.065</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow30_col5" class="data row30 col5">-7</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row31" class="row_heading level0 row31">31</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow31_col0" class="data row31 col0">TN</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow31_col1" class="data row31 col1">16.678</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow31_col2" class="data row31 col2">0.455</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow31_col3" class="data row31 col3">0.084</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow31_col4" class="data row31 col4">0.055</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow31_col5" class="data row31 col5">5</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row32" class="row_heading level0 row32">32</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow32_col0" class="data row32 col0">KY</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow32_col1" class="data row32 col1">14.579</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow32_col2" class="data row32 col2">0.605</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow32_col3" class="data row32 col3">0.106</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow32_col4" class="data row32 col4">0.05</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow32_col5" class="data row32 col5">-4</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row33" class="row_heading level0 row33">33</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow33_col0" class="data row33 col0">OK</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow33_col1" class="data row33 col1">20.827</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow33_col2" class="data row33 col2">0.322</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow33_col3" class="data row33 col3">0.076</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow33_col4" class="data row33 col4">0.048</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow33_col5" class="data row33 col5">4</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row34" class="row_heading level0 row34">34</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow34_col0" class="data row34 col0">MA</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow34_col1" class="data row34 col1">47.482</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow34_col2" class="data row34 col2">2.886</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow34_col3" class="data row34 col3">0.065</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow34_col4" class="data row34 col4">0.044</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow34_col5" class="data row34 col5">4</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row35" class="row_heading level0 row35">35</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow35_col0" class="data row35 col0">LA</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow35_col1" class="data row35 col1">22.462</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow35_col2" class="data row35 col2">0.288</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow35_col3" class="data row35 col3">0.094</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow35_col4" class="data row35 col4">0.036</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow35_col5" class="data row35 col5">-3</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row36" class="row_heading level0 row36">36</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow36_col0" class="data row36 col0">NE</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow36_col1" class="data row36 col1">26.392</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow36_col2" class="data row36 col2">0.549</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow36_col3" class="data row36 col3">0.044</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow36_col4" class="data row36 col4">0.032</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow36_col5" class="data row36 col5">4</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row37" class="row_heading level0 row37">37</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow37_col0" class="data row37 col0">MS</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow37_col1" class="data row37 col1">11.292</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow37_col2" class="data row37 col2">0.209</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow37_col3" class="data row37 col3">0.055</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow37_col4" class="data row37 col4">0.032</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow37_col5" class="data row37 col5">2</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row38" class="row_heading level0 row38">38</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow38_col0" class="data row38 col0">ND</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow38_col1" class="data row38 col1">20.396</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow38_col2" class="data row38 col2">0.532</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow38_col3" class="data row38 col3">0.042</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow38_col4" class="data row38 col4">0.029</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow38_col5" class="data row38 col5">3</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row39" class="row_heading level0 row39">39</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow39_col0" class="data row39 col0">AR</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow39_col1" class="data row39 col1">34.187</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow39_col2" class="data row39 col2">0.563</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow39_col3" class="data row39 col3">0.042</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow39_col4" class="data row39 col4">0.028</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow39_col5" class="data row39 col5">3</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row40" class="row_heading level0 row40">40</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow40_col0" class="data row40 col0">DE</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow40_col1" class="data row40 col1">19.605</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow40_col2" class="data row40 col2">1.604</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow40_col3" class="data row40 col3">0.037</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow40_col4" class="data row40 col4">0.027</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow40_col5" class="data row40 col5">4</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row41" class="row_heading level0 row41">41</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow41_col0" class="data row41 col0">AL</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow41_col1" class="data row41 col1">12.857</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow41_col2" class="data row41 col2">0.328</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow41_col3" class="data row41 col3">0.087</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow41_col4" class="data row41 col4">0.026</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow41_col5" class="data row41 col5">-6</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row42" class="row_heading level0 row42">42</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow42_col0" class="data row42 col0">MD</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow42_col1" class="data row42 col1">42.382</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow42_col2" class="data row42 col2">3.127</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow42_col3" class="data row42 col3">0.105</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow42_col4" class="data row42 col4">0.026</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow42_col5" class="data row42 col5">-13</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row43" class="row_heading level0 row43">43</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow43_col0" class="data row43 col0">AK</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow43_col1" class="data row43 col1">21.693</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow43_col2" class="data row43 col2">0.793</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow43_col3" class="data row43 col3">0.092</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow43_col4" class="data row43 col4">0.025</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow43_col5" class="data row43 col5">-10</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row44" class="row_heading level0 row44">44</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow44_col0" class="data row44 col0">RI</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow44_col1" class="data row44 col1">24.2</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow44_col2" class="data row44 col2">2.57</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow44_col3" class="data row44 col3">0.033</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow44_col4" class="data row44 col4">0.025</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow44_col5" class="data row44 col5">1</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row45" class="row_heading level0 row45">45</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow45_col0" class="data row45 col0">ID</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow45_col1" class="data row45 col1">12.434</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow45_col2" class="data row45 col2">0.541</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow45_col3" class="data row45 col3">0.029</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow45_col4" class="data row45 col4">0.019</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow45_col5" class="data row45 col5">1</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row46" class="row_heading level0 row46">46</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow46_col0" class="data row46 col0">VT</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow46_col1" class="data row46 col1">22.729</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow46_col2" class="data row46 col2">6.363</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow46_col3" class="data row46 col3">0.021</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow46_col4" class="data row46 col4">0.014</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow46_col5" class="data row46 col5">1</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row47" class="row_heading level0 row47">47</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow47_col0" class="data row47 col0">SD</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow47_col1" class="data row47 col1">20.033</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow47_col2" class="data row47 col2">0.318</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow47_col3" class="data row47 col3">0.039</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow47_col4" class="data row47 col4">0.013</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow47_col5" class="data row47 col5">-4</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row48" class="row_heading level0 row48">48</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow48_col0" class="data row48 col0">HI</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow48_col1" class="data row48 col1">17.087</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow48_col2" class="data row48 col2">4.348</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow48_col3" class="data row48 col3">0.02</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow48_col4" class="data row48 col4">0.01</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow48_col5" class="data row48 col5">0</td>
</tr>
<tr>
<th id="T_c670a274_0435_11ea_ae88_c869cd9e96dalevel0_row49" class="row_heading level0 row49">49</th>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow49_col0" class="data row49 col0">WY</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow49_col1" class="data row49 col1">70.562</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow49_col2" class="data row49 col2">0.443</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow49_col3" class="data row49 col3">0.012</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow49_col4" class="data row49 col4">0.009</td>
<td id="T_c670a274_0435_11ea_ae88_c869cd9e96darow49_col5" class="data row49 col5">0</td>
</tr>
</tbody>
</table>
</div>
<p>So although Florida leads in the previous <code class="language-plaintext highlighter-rouge">voting_power</code> calculation, it ranks four spots lower in the <code class="language-plaintext highlighter-rouge">voting_power_2020</code> numbers. This is because although there were many close elections in Florida over the past cycle, fewer of those seats are up for reelection in 2020.</p>
<p>North Carolina on the other hand has elections at every level of government in 2020. Many of those elections will have close margins, resulting in a higher voting power value. I think this underscores the importance of considering every election in an analysis instead of just thinking about the presidency. An approach like this allows you to make the best use of limited resources by focusing on places where your effort helps more than one campaign. It’s not reflected in this analysis, but North Carolina has additional appeal for Democrats in 2020 because its maps for congress and state legislature will have to be redrawn due to an <a href="https://ballotpedia.org/Redistricting_in_North_Carolina">unconstitutional gerrymander</a>. This gives Democrats an additional incentive to focus on this state.</p>
<p>Note that I also included per capita spending and the D:R ratio of spending in the table above. I find this data useful because it gives me an idea of the marginal benefit of investing in a state. For example, if a place already has really high per capita spending or if your party already outspends your opponent 2:1, it probably doesn’t make sense to spend more resources there. These <a href="https://www.opensecrets.org/overview/statetotals.php">numbers</a> are courtesy of the Center for Responsive Politics over the 2014-2018 election time period. In the future, I might try to incorporate the per capita spending directly into the index but it adds too much complication for now.</p>
<p>Here’s the breakdown by office for the top ten states:</p>
<!--style="height:700px; width:300px; overflow:auto;"-->
<div>
<style type="text/css">
#T_cce61ed6_0435_11ea_ae88_c869cd9e96darow0_col0 {
background-color: #67000d;
color: #f1f1f1;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow1_col0 {
background-color: #ffece4;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow2_col0 {
background-color: #fff0e9;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow3_col0 {
background-color: #fff4ee;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow4_col0 {
background-color: #fff4ef;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow5_col0 {
background-color: #fff4ef;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow6_col0 {
background-color: #900a12;
color: #f1f1f1;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow7_col0 {
background-color: #fff4ee;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow8_col0 {
background-color: #fff4ef;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow9_col0 {
background-color: #fff5f0;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow10_col0 {
background-color: #fcbca2;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow11_col0 {
background-color: #fedccd;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow12_col0 {
background-color: #fff4ee;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow13_col0 {
background-color: #fff5f0;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow14_col0 {
background-color: #fff5f0;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow15_col0 {
background-color: #fff5f0;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow16_col0 {
background-color: #fc997a;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow17_col0 {
background-color: #fff2ec;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow18_col0 {
background-color: #fff3ed;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow19_col0 {
background-color: #fff4ef;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow20_col0 {
background-color: #fca98c;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow21_col0 {
background-color: #fff0e8;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow22_col0 {
background-color: #fff0e9;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow23_col0 {
background-color: #fff4ee;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow24_col0 {
background-color: #fdd4c2;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow25_col0 {
background-color: #fff0e9;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow26_col0 {
background-color: #fff5f0;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow27_col0 {
background-color: #fff5f0;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow28_col0 {
background-color: #ffece4;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow29_col0 {
background-color: #ffeee7;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow30_col0 {
background-color: #fff2eb;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow31_col0 {
background-color: #fff3ed;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow32_col0 {
background-color: #fff4ef;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow33_col0 {
background-color: #fee7dc;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow34_col0 {
background-color: #fff2eb;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow35_col0 {
background-color: #fff4ee;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow36_col0 {
background-color: #fff4ef;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow37_col0 {
background-color: #fff5f0;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow38_col0 {
background-color: #ffebe2;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow39_col0 {
background-color: #ffefe8;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow40_col0 {
background-color: #fff5f0;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow41_col0 {
background-color: #fff5f0;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow42_col0 {
background-color: #fff5f0;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow43_col0 {
background-color: #fff0e8;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow44_col0 {
background-color: #fff0e9;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow45_col0 {
background-color: #fff2eb;
color: #000000;
} #T_cce61ed6_0435_11ea_ae88_c869cd9e96darow46_col0 {
background-color: #fff2ec;
color: #000000;
}</style><table id="T_cce61ed6_0435_11ea_ae88_c869cd9e96da"><thead> <tr> <th class="blank"></th> <th class="blank"></th> <th class="blank level0"></th> <th class="col_heading level0 col0">office_voting_power</th> </tr> <tr> <th class="index_name level0">state_abbr</th> <th class="index_name level1">state_voting_power</th> <th class="index_name level2">office</th> <th class="blank"></th> </tr></thead><tbody>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel0_row0" class="row_heading level0 row0" rowspan="6">NC</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel1_row0" class="row_heading level1 row0" rowspan="6">4.025</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row0" class="row_heading level2 row0">governor</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow0_col0" class="data row0 col0">3.637</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row1" class="row_heading level2 row1">president</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow1_col0" class="data row1 col0">0.191</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row2" class="row_heading level2 row2">ushouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow2_col0" class="data row2 col0">0.114</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row3" class="row_heading level2 row3">statehouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow3_col0" class="data row3 col0">0.038</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row4" class="row_heading level2 row4">statesenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow4_col0" class="data row4 col0">0.023</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row5" class="row_heading level2 row5">ussenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow5_col0" class="data row5 col0">0.022</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel0_row6" class="row_heading level0 row6" rowspan="4">MI</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel1_row6" class="row_heading level1 row6" rowspan="4">3.418</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row6" class="row_heading level2 row6">president</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow6_col0" class="data row6 col0">3.334</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row7" class="row_heading level2 row7">ushouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow7_col0" class="data row7 col0">0.035</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row8" class="row_heading level2 row8">statehouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow8_col0" class="data row8 col0">0.03</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row9" class="row_heading level2 row9">ussenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow9_col0" class="data row9 col0">0.019</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel0_row10" class="row_heading level0 row10" rowspan="6">NH</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel1_row10" class="row_heading level1 row10" rowspan="6">1.485</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row10" class="row_heading level2 row10">ussenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow10_col0" class="data row10 col0">0.908</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row11" class="row_heading level2 row11">president</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow11_col0" class="data row11 col0">0.506</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row12" class="row_heading level2 row12">statesenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow12_col0" class="data row12 col0">0.041</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row13" class="row_heading level2 row13">governor</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow13_col0" class="data row13 col0">0.015</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row14" class="row_heading level2 row14">statehouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow14_col0" class="data row14 col0">0.01</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row15" class="row_heading level2 row15">ushouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow15_col0" class="data row15 col0">0.006</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel0_row16" class="row_heading level0 row16" rowspan="4">PA</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel1_row16" class="row_heading level1 row16" rowspan="4">1.421</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row16" class="row_heading level2 row16">president</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow16_col0" class="data row16 col0">1.283</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row17" class="row_heading level2 row17">statehouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow17_col0" class="data row17 col0">0.066</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row18" class="row_heading level2 row18">ushouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow18_col0" class="data row18 col0">0.049</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row19" class="row_heading level2 row19">statesenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow19_col0" class="data row19 col0">0.023</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel0_row20" class="row_heading level0 row20" rowspan="4">FL</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel1_row20" class="row_heading level1 row20" rowspan="4">1.394</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row20" class="row_heading level2 row20">president</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow20_col0" class="data row20 col0">1.124</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row21" class="row_heading level2 row21">statehouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow21_col0" class="data row21 col0">0.123</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row22" class="row_heading level2 row22">ushouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow22_col0" class="data row22 col0">0.112</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row23" class="row_heading level2 row23">statesenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow23_col0" class="data row23 col0">0.035</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel0_row24" class="row_heading level0 row24" rowspan="4">WI</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel1_row24" class="row_heading level1 row24" rowspan="4">0.738</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row24" class="row_heading level2 row24">president</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow24_col0" class="data row24 col0">0.608</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row25" class="row_heading level2 row25">statesenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow25_col0" class="data row25 col0">0.111</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row26" class="row_heading level2 row26">ushouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow26_col0" class="data row26 col0">0.011</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row27" class="row_heading level2 row27">statehouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow27_col0" class="data row27 col0">0.008</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel0_row28" class="row_heading level0 row28" rowspan="5">TX</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel1_row28" class="row_heading level1 row28" rowspan="5">0.511</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row28" class="row_heading level2 row28">president</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow28_col0" class="data row28 col0">0.196</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row29" class="row_heading level2 row29">ushouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow29_col0" class="data row29 col0">0.151</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row30" class="row_heading level2 row30">statehouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow30_col0" class="data row30 col0">0.09</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row31" class="row_heading level2 row31">ussenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow31_col0" class="data row31 col0">0.049</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row32" class="row_heading level2 row32">statesenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow32_col0" class="data row32 col0">0.024</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel0_row33" class="row_heading level0 row33" rowspan="5">MN</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel1_row33" class="row_heading level1 row33" rowspan="5">0.464</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row33" class="row_heading level2 row33">president</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow33_col0" class="data row33 col0">0.306</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row34" class="row_heading level2 row34">ushouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow34_col0" class="data row34 col0">0.086</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row35" class="row_heading level2 row35">statesenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow35_col0" class="data row35 col0">0.04</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row36" class="row_heading level2 row36">statehouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow36_col0" class="data row36 col0">0.021</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row37" class="row_heading level2 row37">ussenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow37_col0" class="data row37 col0">0.01</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel0_row38" class="row_heading level0 row38" rowspan="5">GA</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel1_row38" class="row_heading level1 row38" rowspan="5">0.416</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row38" class="row_heading level2 row38">ushouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow38_col0" class="data row38 col0">0.231</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row39" class="row_heading level2 row39">president</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow39_col0" class="data row39 col0">0.145</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row40" class="row_heading level2 row40">ussenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow40_col0" class="data row40 col0">0.018</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row41" class="row_heading level2 row41">statehouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow41_col0" class="data row41 col0">0.014</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row42" class="row_heading level2 row42">statesenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow42_col0" class="data row42 col0">0.008</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel0_row43" class="row_heading level0 row43" rowspan="4">CA</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel1_row43" class="row_heading level1 row43" rowspan="4">0.392</th>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row43" class="row_heading level2 row43">ushouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow43_col0" class="data row43 col0">0.127</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row44" class="row_heading level2 row44">statesenate</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow44_col0" class="data row44 col0">0.111</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row45" class="row_heading level2 row45">president</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow45_col0" class="data row45 col0">0.085</td>
</tr>
<tr>
<th id="T_cce61ed6_0435_11ea_ae88_c869cd9e96dalevel2_row46" class="row_heading level2 row46">statehouse</th>
<td id="T_cce61ed6_0435_11ea_ae88_c869cd9e96darow46_col0" class="data row46 col0">0.069</td>
</tr>
</tbody>
</table>
</div>
<p>One thing that concerns me about this analysis is that it may overfit to the results of past elections. Instead, I could use aggregated polling data for each election to predict the future margin, then combine this with the <code class="language-plaintext highlighter-rouge">seat_potential_power</code> to make a live-updating index. Maybe I’ll start doing this once 538 or another polling aggregator starts publishing predictions for the 2020 elections.</p>
<h2 id="references">References</h2>
<p>[1] The Center for Responsive Politics, opensecrets.org. <a href="https://www.opensecrets.org/overview/statetotals.php">https://www.opensecrets.org/overview/statetotals.php</a></p>
<p>[2] Where do voters have the most political influence?. <a href="https://pstblog.com/2019/03/05/voting-power-comprehensive">https://pstblog.com/2019/03/05/voting-power-comprehensive</a></p>
<p>[3] Source code, voting-power-comprehensive <a href="https://github.com/psthomas/voting-power-comprehensive">https://github.com/psthomas/voting-power-comprehensive</a></p>
<!--
<script>
$(document).ready(function() {
$('#voting2020').DataTable( {
"scrollY": "700px",
"scrollCollapse": true,
"paging": false,
"searching": false
} );
} );
</script> -->
<p><a href="https://pstblog.com/2019/10/10/voting-power-2020">Where will voters have the most political influence in 2020?</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on October 10, 2019.</p>
https://pstblog.com/2019/03/05/voting-power-comprehensive2019-03-05T00:00:00+00:002019-03-05T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>It’s difficult to get a broad mental overview of politics in the United States. There are so many elections covering different districts and each office holder has a different level of influence over policy outcomes. This post is an attempt to help simplify things by bringing all the federal and state level election results together into one place. I then use an approach described below to combine all the results into a single voting power metric for each location. The end result is a map that shows the cumulative political influence of the voters in each place:</p>
<figure id="vis" style="text-align:center">
<a href="/images/votepower-comp/sum-votepower.png"><img style="max-height:800px" src="/images/votepower-comp/sum-votepower.png" /></a>
<figcaption>The cumulative voting power for each geography over the last election cycle. The small white lines represent unique sub-districts generated by legislative district overlaps. The highest district voting power is yellow, west of Miami.</figcaption>
</figure>
<p>Here are the main takeaways from this analysis:</p>
<ul>
<li>We should pay more attention to races for governor. Governors probably collectively wield as much power as the president, but their elections don’t get anywhere near the same level of attention.</li>
<li>Voting power within states is eroded by the fact that 40 percent (!) of the candidates for state legislature run unopposed. This is probably a symptom of overly gerrymandered state districts.</li>
<li>Voters in FL, NC, MI, PA, NH, WI and GA are especially powerful, mainly because they participated in close elections for president and governor.</li>
<li>By this metric, the most powerful location during the past election cycle was a western suburb of Miami.</li>
</ul>
<p>All the code and most of the data for this post are available on GitHub <a href="https://github.com/psthomas/voting-power-comprehensive">here</a>. This analysis builds on my <a href="https://pstblog.com/2018/05/08/voting-power">earlier post</a> covering voting power at the presidential level but it’s more comprehensive because it includes the often-ignored state level elections.</p>
<h1 id="the-power-sharing-model">The Power Sharing Model</h1>
<p>It seems to me that at least two things influence the political power of a voter:</p>
<ul>
<li>Their ability to change election outcomes. If there’s no chance a voter will swing an election, voting is pointless.</li>
<li>The power held by their elected officials. Voters are powerless if their representatives can’t influence policy.</li>
</ul>
<p>So I build off these two concepts to come up with the voting power metric below.</p>
<p>The first step is to allocate potential power to each seat in the government. I start out with an arbitrary <code class="language-plaintext highlighter-rouge">100</code> points of power, and allocate half to the federal government and half to the states. The power at the federal level is then further subdivided between the president (<code class="language-plaintext highlighter-rouge">25</code>) and congress (<code class="language-plaintext highlighter-rouge">25</code>), with the house and senate dividing the congressional power evenly. The other <code class="language-plaintext highlighter-rouge">50</code> points of power are divided between the states according to their fraction of the national population. Each state’s value is then split between the governor and state legislature just like at the federal level. The end result is a potential power value for every seat of the state and federal government (judiciary excluded).</p>
<p>The second step is to calculate a realized voting power value for each seat. To do this, I need a way to combine the power value from above with the election margin. The most obvious way is to divide them so that the power of the seat increases as the margin gets closer to zero. Here’s the equation:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">voting_power</span> <span class="o">=</span> <span class="n">seat_potential_power</span><span class="o">/</span><span class="n">percent_absolute_margin</span></code></pre></figure>
<p>So the closer the election, the higher the voting power value. One thing to note here that the voting power of a seat doesn’t have a ceiling – an election for state legislature could exceed the power of the presidential vote if the margin is close enough. (Actually, there is a ceiling but it’s just really high. For example, if an election is won by a single vote, <code class="language-plaintext highlighter-rouge">voting_power = seat_potential_power*(n/100)</code>, where n is the total number of voters in the election.)</p>
<h1 id="summary-statistics">Summary Statistics</h1>
<p>So after the theory above, what do these values look like if they’re actually calculated out? After a lot of data wrangling, I was able to compile results for almost all the state and federal seats in the past election cycle (a cycle here is defined as the time for every seat to be replaced once). Here’s the distribution of voting power values overall:</p>
<figure style="text-align:center">
<a href="/images/votepower-comp/dist.png"><img style="max-height:800px" src="/images/votepower-comp/dist.png" /></a>
<figcaption>The overall distribution of voting power values.</figcaption>
</figure>
<p>Looking at the overall distribution above, it’s clear the values follow something more extreme than a lognormal distribution. The individual distributions below show that the federal elections outperform the state level elections with the exception of the governor’s races.</p>
<figure style="text-align:center">
<a href="/images/votepower-comp/ind-dist.png"><img style="max-height:800px" src="/images/votepower-comp/ind-dists.png" /></a>
<figcaption>The distribution of voting power values by seat.</figcaption>
</figure>
<p>The table below shows the sum of voting power by office type. Looking at the <code class="language-plaintext highlighter-rouge">sum_power</code> column, voters for the governor, president, US congress and state legislature races all have the same amounts of theoretical power (the small differences are due to missing election data). But when you look at the realized <code class="language-plaintext highlighter-rouge">sum_voting_power</code> column, the high mean absolute margins erode the potential power for the state legislatures especially.</p>
<div style="margin: 0px auto;">
<table>
<thead>
<tr>
<th></th>
<th>mean_abs_margin</th>
<th>sum_power</th>
<th>mean_dem_margin</th>
<th>sum_voting_power</th>
</tr>
</thead>
<tbody>
<tr>
<th>governor</th>
<td>14.157078</td>
<td>24.946323</td>
<td>-3.274936</td>
<td>10.740176</td>
</tr>
<tr>
<th>president</th>
<td>18.380372</td>
<td>25.000000</td>
<td>-3.674600</td>
<td>9.040410</td>
</tr>
<tr>
<th>ussenate</th>
<td>21.055863</td>
<td>12.500000</td>
<td>-1.019923</td>
<td>3.414887</td>
</tr>
<tr>
<th>ushouse</th>
<td>30.623423</td>
<td>12.385057</td>
<td>7.662104</td>
<td>1.702852</td>
</tr>
<tr>
<th>statesenate</th>
<td>52.964234</td>
<td>11.785348</td>
<td>-6.824567</td>
<td>1.048945</td>
</tr>
<tr>
<th>statehouse</th>
<td>54.282532</td>
<td>11.780629</td>
<td>-4.403811</td>
<td>0.949632</td>
</tr>
</tbody>
</table>
</div>
<p>It’s also interesting to note that votes for governor have more cumulative power than votes for president because the elections are closer. This doesn’t strike me as obviously false, but it’s worth considering whether or not this reflects reality.</p>
<h1 id="results-by-office">Results by Office</h1>
<p>Next I thought it would be interesting to show detailed maps of the results for each level of government. I find these maps pretty interesting because I’ve never seen all of them together in one place before.</p>
<h2 id="president">President</h2>
<p>Most people are probably familiar with the first map, the 2016 election results. The source is the <a href="https://electionlab.mit.edu/data">MIT Election Labs</a> <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/42MVDX">presidential dataset</a>. One interesting thing to note is that the <code class="language-plaintext highlighter-rouge">democratic margin</code> and <code class="language-plaintext highlighter-rouge">voting power</code> maps are pretty much inverses of each other – places with extreme margins have low voting power. This is a pattern that repeats itself for the remaining maps as well. Also note that I take the <code class="language-plaintext highlighter-rouge">log</code> of the voting power values to better display the variation in the plots below.</p>
<figure style="text-align:center">
<a href="/images/votepower-comp/president.png"><img style="max-height:800px" src="/images/votepower-comp/president.png" /></a>
</figure>
<h2 id="us-senate">US Senate</h2>
<p>Next are the senate results by state from the MIT Election Labs <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/PEJ5QU">senate dataset</a>. The democratic margin value is the average of the two senate races, while the voting power value is summed across both elections. The dates of these senate races range from <code class="language-plaintext highlighter-rouge">2014-2018</code> because elections are staggered with 1/3 of senate seats up for election every 2 years.</p>
<figure style="text-align:center">
<a href="/images/votepower-comp/ussenate.png"><img style="max-height:800px" src="/images/votepower-comp/ussenate.png" /></a>
</figure>
<h2 id="us-house">US House</h2>
<p>Next are the house results, <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IG0UN2">also</a> from the MIT Election Labs. Note that some elections (e.g. Minnesota) don’t follow a traditional Democratic/Republican divide for the margin calculation, but I am still able to calculate the voting power value for those districts by calculating the absolute margin between the top two candidates. All of these results are from <code class="language-plaintext highlighter-rouge">2018</code>.</p>
<figure style="text-align:center">
<a href="/images/votepower-comp/ushouse-margin.png"><img style="max-height:800px" src="/images/votepower-comp/ushouse-margin.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/votepower-comp/ushouse-votepower.png"><img style="max-height:800px" src="/images/votepower-comp/ushouse-votepower.png" /></a>
</figure>
<h2 id="governors">Governors</h2>
<p>The data for the governors races are from <a href="https://uselectionatlas.org/">David Leip’s Election Atlas</a>, covering the most recent result for each state. It’s interesting to note the difference in margins between this map and the presidential map above. Being able to tailor a candidate to each state’s race for governor seems to lead to closer margins than having a single candidate for president.</p>
<figure style="text-align:center">
<a href="/images/votepower-comp/governors.png"><img style="max-height:800px" src="/images/votepower-comp/governors.png" /></a>
</figure>
<h2 id="state-senate">State Senate</h2>
<p>Finally, the results for the state legislatures. The data are from Carl Klarner’s <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DRSACA">state legislative dataset</a>, which covers results through the 2016 elections. The legislative district boundaries are from the <a href="https://www.census.gov/geo/maps-data/data/tiger-geodatabases.html#tab_2018">2018 Census Tiger Geodatabase</a>. There are some missing results, mainly from NE, LA, OK and AR. Nebraska is missing from both the Senate and House results because its legislature is unicameral and their elections are technically nonpartisan.</p>
<p>These maps are really interesting, mainly because I’ve never seen them before. I think the main point of interest here is how extreme the margins are, which results in small the voting power values. One major reason for this is that roughly 40 percent of the candidates for state office run unopposed, so their elections have margins of 100 percent. This could be a <a href="https://twitter.com/bcburden/status/1069773072587206656">symptom of gerrymandering</a> where <a href="https://en.wikipedia.org/wiki/Gerrymandering">cracked</a> districts would have close margins and packed districts would have blowout margins. But it could also just be a symptom of the geography of the state in some cases.</p>
<figure style="text-align:center">
<a href="/images/votepower-comp/statesenate-margin.png"><img style="max-height:800px" src="/images/votepower-comp/statesenate-margin.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/votepower-comp/statesenate-votepower.png"><img style="max-height:800px" src="/images/votepower-comp/statesenate-votepower.png" /></a>
</figure>
<h2 id="state-house">State House</h2>
<p>Here are the state house results, with Carl Klarner’s dataset from above as the source. These results overall seem to be very similar to the Senate results above, just more detailed because there are more seats.</p>
<figure style="text-align:center">
<a href="/images/votepower-comp/statehouse-margin.png"><img style="max-height:800px" src="/images/votepower-comp/statehouse-margin.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/votepower-comp/statehouse-votepower.png"><img style="max-height:800px" src="/images/votepower-comp/statehouse-votepower.png" /></a>
</figure>
<h1 id="combined-results">Combined Results</h1>
<p>It’s a little overwhelming to see all these results at once. Below I try to simplify things by combining them, first by summing the results by state and then by overlaying and merging all the districts together. Here is the first map, which is just a sum of the voting power values by state. Florida, North Carolina, and Michigan are the top performers here:</p>
<figure style="text-align:center">
<a href="/images/votepower-comp/statesum-votepower.png"><img style="max-height:800px" src="/images/votepower-comp/statesum-votepower.png" /></a>
</figure>
<p>Here’s a table summary of the above map:</p>
<div>
<table style="margin-left: auto; margin-right: auto;">
<thead>
<tr>
<th></th>
<th>state_abbr</th>
<th>state_fips</th>
<th>voting_power</th>
</tr>
</thead>
<tbody>
<tr>
<th>9</th>
<td>FL</td>
<td>12</td>
<td>6.551916</td>
</tr>
<tr>
<th>27</th>
<td>NC</td>
<td>37</td>
<td>4.105122</td>
</tr>
<tr>
<th>22</th>
<td>MI</td>
<td>26</td>
<td>3.660639</td>
</tr>
<tr>
<th>38</th>
<td>PA</td>
<td>42</td>
<td>1.575222</td>
</tr>
<tr>
<th>30</th>
<td>NH</td>
<td>33</td>
<td>1.523800</td>
</tr>
<tr>
<th>48</th>
<td>WI</td>
<td>55</td>
<td>1.192909</td>
</tr>
<tr>
<th>10</th>
<td>GA</td>
<td>13</td>
<td>1.001493</td>
</tr>
<tr>
<th>43</th>
<td>TX</td>
<td>48</td>
<td>0.680421</td>
</tr>
<tr>
<th>4</th>
<td>CA</td>
<td>6</td>
<td>0.521339</td>
</tr>
<tr>
<th>23</th>
<td>MN</td>
<td>27</td>
<td>0.513667</td>
</tr>
<tr>
<th>35</th>
<td>OH</td>
<td>39</td>
<td>0.431217</td>
</tr>
<tr>
<th>45</th>
<td>VA</td>
<td>51</td>
<td>0.422264</td>
</tr>
<tr>
<th>34</th>
<td>NY</td>
<td>36</td>
<td>0.364836</td>
</tr>
<tr>
<th>33</th>
<td>NV</td>
<td>32</td>
<td>0.306071</td>
</tr>
<tr>
<th>3</th>
<td>AZ</td>
<td>4</td>
<td>0.290053</td>
</tr>
<tr>
<th>5</th>
<td>CO</td>
<td>8</td>
<td>0.272018</td>
</tr>
<tr>
<th>14</th>
<td>IL</td>
<td>17</td>
<td>0.241070</td>
</tr>
<tr>
<th>31</th>
<td>NJ</td>
<td>34</td>
<td>0.231329</td>
</tr>
<tr>
<th>44</th>
<td>UT</td>
<td>49</td>
<td>0.208606</td>
</tr>
<tr>
<th>24</th>
<td>MO</td>
<td>29</td>
<td>0.207401</td>
</tr>
<tr>
<th>12</th>
<td>IA</td>
<td>19</td>
<td>0.185651</td>
</tr>
<tr>
<th>15</th>
<td>IN</td>
<td>18</td>
<td>0.182072</td>
</tr>
<tr>
<th>47</th>
<td>WA</td>
<td>53</td>
<td>0.173504</td>
</tr>
<tr>
<th>6</th>
<td>CT</td>
<td>9</td>
<td>0.163913</td>
</tr>
<tr>
<th>37</th>
<td>OR</td>
<td>41</td>
<td>0.131566</td>
</tr>
<tr>
<th>16</th>
<td>KS</td>
<td>20</td>
<td>0.127854</td>
</tr>
<tr>
<th>40</th>
<td>SC</td>
<td>45</td>
<td>0.127477</td>
</tr>
<tr>
<th>21</th>
<td>ME</td>
<td>23</td>
<td>0.124064</td>
</tr>
<tr>
<th>17</th>
<td>KY</td>
<td>21</td>
<td>0.106102</td>
</tr>
<tr>
<th>20</th>
<td>MD</td>
<td>24</td>
<td>0.105236</td>
</tr>
<tr>
<th>32</th>
<td>NM</td>
<td>35</td>
<td>0.104684</td>
</tr>
<tr>
<th>49</th>
<td>WV</td>
<td>54</td>
<td>0.101614</td>
</tr>
<tr>
<th>18</th>
<td>LA</td>
<td>22</td>
<td>0.094424</td>
</tr>
<tr>
<th>0</th>
<td>AK</td>
<td>2</td>
<td>0.092161</td>
</tr>
<tr>
<th>26</th>
<td>MT</td>
<td>30</td>
<td>0.087219</td>
</tr>
<tr>
<th>1</th>
<td>AL</td>
<td>1</td>
<td>0.086837</td>
</tr>
<tr>
<th>42</th>
<td>TN</td>
<td>47</td>
<td>0.083975</td>
</tr>
<tr>
<th>36</th>
<td>OK</td>
<td>40</td>
<td>0.076436</td>
</tr>
<tr>
<th>19</th>
<td>MA</td>
<td>25</td>
<td>0.065192</td>
</tr>
<tr>
<th>25</th>
<td>MS</td>
<td>28</td>
<td>0.054925</td>
</tr>
<tr>
<th>29</th>
<td>NE</td>
<td>31</td>
<td>0.044191</td>
</tr>
<tr>
<th>28</th>
<td>ND</td>
<td>38</td>
<td>0.042327</td>
</tr>
<tr>
<th>2</th>
<td>AR</td>
<td>5</td>
<td>0.041736</td>
</tr>
<tr>
<th>41</th>
<td>SD</td>
<td>46</td>
<td>0.038921</td>
</tr>
<tr>
<th>8</th>
<td>DE</td>
<td>10</td>
<td>0.036613</td>
</tr>
<tr>
<th>39</th>
<td>RI</td>
<td>44</td>
<td>0.033107</td>
</tr>
<tr>
<th>13</th>
<td>ID</td>
<td>16</td>
<td>0.028847</td>
</tr>
<tr>
<th>46</th>
<td>VT</td>
<td>50</td>
<td>0.021205</td>
</tr>
<tr>
<th>11</th>
<td>HI</td>
<td>15</td>
<td>0.019575</td>
</tr>
<tr>
<th>50</th>
<td>WY</td>
<td>56</td>
<td>0.012467</td>
</tr>
<tr>
<th>7</th>
<td>DC</td>
<td>11</td>
<td>0.001613</td>
</tr>
</tbody>
</table>
</div>
<p>And here are the top offices from the leading states. The <code class="language-plaintext highlighter-rouge">min_abs_margin</code> column is the margin of the closest race for each seat, which is probably responsible for most of the <code class="language-plaintext highlighter-rouge">voting_power</code> value. Generally, governors or presidential votes seem to be driving the scores, but not always (e.g. NH):</p>
<div>
<table style="margin-left: auto; margin-right:auto;">
<thead>
<tr>
<th>state_abbr</th>
<th>state_power</th>
<th>office</th>
<th>power</th>
<th>min_abs_margin</th>
<th>voting_power</th>
</tr>
</thead>
<tbody>
<tr>
<th rowspan="6" valign="top">FL</th>
<th>6.551916</th>
<td>governor</td>
<td>1.627555</td>
<td>0.394900</td>
<td>4.121436</td>
</tr>
<tr>
<th></th>
<td>president</td>
<td>1.347584</td>
<td>1.198626</td>
<td>1.124274</td>
</tr>
<tr>
<th></th>
<td>ussenate</td>
<td>0.125000</td>
<td>0.122503</td>
<td>1.036689</td>
</tr>
<tr>
<th></th>
<td>statehouse</td>
<td>0.006796</td>
<td>0.084434</td>
<td>0.123028</td>
</tr>
<tr>
<th></th>
<td>ushouse</td>
<td>0.028736</td>
<td>0.874615</td>
<td>0.111829</td>
</tr>
<tr>
<th></th>
<td>statesenate</td>
<td>0.020388</td>
<td>3.258565</td>
<td>0.034659</td>
</tr>
<tr>
<th rowspan="6" valign="top">NC</th>
<th>4.105122</th>
<td>governor</td>
<td>0.793448</td>
<td>0.218148</td>
<td>3.637197</td>
</tr>
<tr>
<th></th>
<td>president</td>
<td>0.697026</td>
<td>3.655229</td>
<td>0.190693</td>
</tr>
<tr>
<th></th>
<td>ushouse</td>
<td>0.028736</td>
<td>0.320108</td>
<td>0.114202</td>
</tr>
<tr>
<th></th>
<td>ussenate</td>
<td>0.125000</td>
<td>1.564446</td>
<td>0.101845</td>
</tr>
<tr>
<th></th>
<td>statehouse</td>
<td>0.003313</td>
<td>0.383575</td>
<td>0.037828</td>
</tr>
<tr>
<th></th>
<td>statesenate</td>
<td>0.007952</td>
<td>0.890182</td>
<td>0.023357</td>
</tr>
<tr>
<th rowspan="6" valign="top">MI</th>
<th>3.660639</th>
<td>president</td>
<td>0.743494</td>
<td>0.223033</td>
<td>3.333558</td>
</tr>
<tr>
<th></th>
<td>statesenate</td>
<td>0.010072</td>
<td>0.075871</td>
<td>0.153072</td>
</tr>
<tr>
<th></th>
<td>governor</td>
<td>0.763823</td>
<td>9.567130</td>
<td>0.079838</td>
</tr>
<tr>
<th></th>
<td>ushouse</td>
<td>0.028736</td>
<td>3.834388</td>
<td>0.035380</td>
</tr>
<tr>
<th></th>
<td>statehouse</td>
<td>0.003479</td>
<td>0.593936</td>
<td>0.030168</td>
</tr>
<tr>
<th></th>
<td>ussenate</td>
<td>0.125000</td>
<td>6.505602</td>
<td>0.028623</td>
</tr>
<tr>
<th rowspan="6" valign="top">PA</th>
<th>1.575222</th>
<td>president</td>
<td>0.929368</td>
<td>0.724270</td>
<td>1.283180</td>
</tr>
<tr>
<th></th>
<td>ussenate</td>
<td>0.125000</td>
<td>1.432453</td>
<td>0.096975</td>
</tr>
<tr>
<th></th>
<td>statehouse</td>
<td>0.002416</td>
<td>0.068476</td>
<td>0.065791</td>
</tr>
<tr>
<th></th>
<td>governor</td>
<td>0.978632</td>
<td>17.072299</td>
<td>0.057323</td>
</tr>
<tr>
<th></th>
<td>ushouse</td>
<td>0.028736</td>
<td>2.519118</td>
<td>0.049276</td>
</tr>
<tr>
<th></th>
<td>statesenate</td>
<td>0.009807</td>
<td>2.724007</td>
<td>0.022677</td>
</tr>
<tr>
<th rowspan="6" valign="top">NH</th>
<th>1.523800</th>
<td>ussenate</td>
<td>0.125000</td>
<td>0.137592</td>
<td>0.947011</td>
</tr>
<tr>
<th></th>
<td>president</td>
<td>0.185874</td>
<td>0.367596</td>
<td>0.505647</td>
</tr>
<tr>
<th></th>
<td>statesenate</td>
<td>0.002164</td>
<td>0.061277</td>
<td>0.040793</td>
</tr>
<tr>
<th></th>
<td>governor</td>
<td>0.103652</td>
<td>7.044083</td>
<td>0.014715</td>
</tr>
<tr>
<th></th>
<td>statehouse</td>
<td>0.000130</td>
<td>0.137468</td>
<td>0.010124</td>
</tr>
<tr>
<th></th>
<td>ushouse</td>
<td>0.028736</td>
<td>8.551431</td>
<td>0.005511</td>
</tr>
<tr>
<th rowspan="6" valign="top">WI</th>
<th>1.192909</th>
<td>president</td>
<td>0.464684</td>
<td>0.764343</td>
<td>0.607952</td>
</tr>
<tr>
<th></th>
<td>governor</td>
<td>0.444235</td>
<td>1.093290</td>
<td>0.406329</td>
</tr>
<tr>
<th></th>
<td>statesenate</td>
<td>0.006745</td>
<td>0.070027</td>
<td>0.111116</td>
</tr>
<tr>
<th></th>
<td>ussenate</td>
<td>0.125000</td>
<td>3.361977</td>
<td>0.048715</td>
</tr>
<tr>
<th></th>
<td>ushouse</td>
<td>0.028736</td>
<td>11.005491</td>
<td>0.010840</td>
</tr>
<tr>
<th></th>
<td>statehouse</td>
<td>0.002248</td>
<td>2.667798</td>
<td>0.007957</td>
</tr>
<tr>
<th rowspan="6" valign="top">GA</th>
<th>1.001493</th>
<td>governor</td>
<td>0.803830</td>
<td>1.389146</td>
<td>0.578650</td>
</tr>
<tr>
<th></th>
<td>ushouse</td>
<td>0.028736</td>
<td>0.149408</td>
<td>0.231010</td>
</tr>
<tr>
<th></th>
<td>president</td>
<td>0.743494</td>
<td>5.131343</td>
<td>0.144893</td>
</tr>
<tr>
<th></th>
<td>ussenate</td>
<td>0.125000</td>
<td>7.682710</td>
<td>0.025361</td>
</tr>
<tr>
<th></th>
<td>statehouse</td>
<td>0.002238</td>
<td>0.902439</td>
<td>0.013748</td>
</tr>
<tr>
<th></th>
<td>statesenate</td>
<td>0.007192</td>
<td>3.847239</td>
<td>0.007831</td>
</tr>
</tbody>
</table>
</div>
<p>The state sums above provide a pretty good summary, but it is possible to get more detailed results by overlaying all the districts. This is done with the geopandas <a href="http://geopandas.org/set_operations.html">union function</a> which creates unique geographies for all overlapping and non-overlapping districts. I had to simplify and buffer the district boundaries in order to get the computation to work, but the end result is still pretty interesting.</p>
<p>First is the average democratic margin across all elections for each location:</p>
<figure style="text-align:center">
<a href="/images/votepower-comp/avg-margin.png"><img style="max-height:800px" src="/images/votepower-comp/avg-margin.png" /></a>
</figure>
<p>Next is the cumulative voting power value for each location. The small white lines on the map outline unique districts with their own voting power values. Even though there’s much more detail, this map doesn’t look very different from the map above that shows sums by state. This is because elections that follow state boundaries (e.g. governor/president/senate) drive the voting power values, so the differences between states are greater than the within state differences in most cases.</p>
<figure style="text-align:center">
<a href="/images/votepower-comp/sum-votepower.png"><img style="max-height:800px" src="/images/votepower-comp/sum-votepower.png" /></a>
</figure>
<p>I was expecting this map to show a lot more within state variation, but instead it emphasizes the statewide elections. This is because the statewide offices are generally more powerful and the margins of those elections tend to be closer.</p>
<h1 id="model-problems">Model Problems</h1>
<p>Here are a few potential problems I can think of with the above analysis:</p>
<ul>
<li><strong>Sensitivity:</strong> It’s possible that these results are too sensitive to the outcome of a single election. I try to include as many elections as possible to counteract this, but the power values are still driven by the races for governor and president.</li>
<li><strong>Excluded elections:</strong> Judiciary and local government elections are left out of this analysis.</li>
<li><strong>Calculations:</strong> The power distribution and voting power calculations make a lot of assumptions which might not be true. It’s possible that a different method of combining or normalizing the inputs would lead to better results.</li>
<li><strong>Past results aren’t indicative of the future:</strong> The high power values might just be driven by random variation in election margins and not say anything intrinsic about a place.</li>
<li><strong>Legislative Control:</strong> The threshold for flipping control of a legislative body matters too. You don’t have much power if you participate in close elections but there’s no chance for your party to ever take the majority. My 2020 elections <a href="https://pstblog.com/2020/09/09/elections-meta">model</a> fixes this problem by taking tipping point thresholds into account, so that might be worth a look too.</li>
</ul>
<p>Even with these concerns, I think the analysis above provides some useful insights. All the code for this project is <a href="https://github.com/psthomas/voting-power-comprehensive">available</a> on GitHub, so I’d welcome any comments or contributions to make it better. There are probably other interesting analyses I could do with this dataset, so stay tuned for future posts on this subject.</p>
<h1 id="references">References</h1>
<p>[1] MIT Election Labs Data. <a href="https://electionlab.mit.edu/data">All Data</a>, <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/42MVDX">President</a>, <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/PEJ5QU">US Senate</a>, <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IG0UN2">US House</a>.</p>
<p>[2] David Leip’s Election Atlas, Governor Results. <a href="https://uselectionatlas.org/">https://uselectionatlas.org/</a>.</p>
<p>[3] State Legislative Election Returns, 1967-2016: Restructured For Use. Carl Klarner. <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DRSACA">https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DRSACA</a>.</p>
<p>[4] US Census Bureau Tiger Geodatabase, 2018 Legislative Areas National Database. <a href="https://www.census.gov/geo/maps-data/data/tiger-geodatabases.html#tab_2018">https://www.census.gov/geo/maps-data/data/tiger-geodatabases.html#tab_2018</a>.</p>
<p>[5] Code and data for this post: voting-power-comprehensive repo. <a href="https://github.com/psthomas/voting-power-comprehensive">https://github.com/psthomas/voting-power-comprehensive</a>.</p>
<p><a href="https://pstblog.com/2019/03/05/voting-power-comprehensive">Where do voters have the most political influence?</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on March 05, 2019.</p>
https://pstblog.com/2018/11/26/convergence2018-11-26T00:00:00+00:002018-11-26T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>The lack of <a href="https://en.wikipedia.org/wiki/Convergence_(economics)">catch-up growth</a> in low income countries is a persistent puzzle in the field of development economics. According to <a href="https://en.wikipedia.org/wiki/Solow%E2%80%93Swan_model">basic theory</a>, low income countries should grow more quickly because they have better opportunities for capital investments and technology transfer. There are a number of reasons the real world can diverge from basic theory (e.g. bad institutions, poor provision of public goods, frictions in technology transfer), but I recently came across a <a href="https://www.cgdev.org/blog/everything-you-know-about-cross-country-convergence-now-wrong">blogpost</a> that suggests that this isn’t really a puzzle anymore. Low and middle income economies do seem to be growing more quickly relative to the advanced economies:</p>
<figure style="text-align:center">
<a href="/images/convergence/patel-sandefur-subramanian-beta_by_series-NEW.png"><img style="max-height:800px" src="/images/convergence/patel-sandefur-subramanian-beta_by_series-NEW.png" /></a>
</figure>
<p>Each point in the graphic above represents the slope of a regression between the GDP in year X and the subsequent growth rate for each country. So a positive beta means that rich countries grew more quickly while a negative beta means that poor countries grew more quickly over the remaining time period. For the sake of clarity, the following graphic shows an example regression, where the slope of the line through each dataset is the beta value:</p>
<figure style="text-align:center">
<a href="/images/convergence/output_23_3.png"><img src="/images/convergence/output_23_3.png" /></a>
</figure>
<p>So the slope goes from being roughly flat in 1970 to negative in 2010. All the code and data for this post are available on GitHub <a href="https://github.com/psthomas/convergence">here</a>.</p>
<h2 id="the-great-recession">The Great Recession</h2>
<p>One question I had when looking at the original graphic is how the great recession influenced these results. If the advanced economies were acting abnormally over the end period used for the calculations, it might lead us to the wrong conclusion about convergence. So I decided to re-create <a href="https://www.dropbox.com/sh/eu5sob3hs56oymg/AAALaoQOCPt--3u1fQaHMihua?dl=0">their code</a> in Python and make the original beta plot:</p>
<figure id="vis" style="text-align:center">
<a href="/images/convergence/output_14_3.png"><img src="/images/convergence/output_14_3.png" /></a>
</figure>
<p>Note that this looks a little different because I extended the axis through 2020. I’m not sure why the authors ended the axis at 2000 in their version – perhaps the betas aren’t reliable beyond that point? Next, I created a plot that excludes the post 2006 data:</p>
<figure style="text-align:center">
<a href="/images/convergence/output_19_3.png"><img src="/images/convergence/output_19_3.png" /></a>
</figure>
<p>So a similar pattern is present, but the betas aren’t convincingly negative anymore. One might argue that excluding post 2006 data is unfair because it leaves out years of high growth for low and middle income countries. This is true, and I could try to construct a non-recession counterfactual for the rich countries to help deal with this but I won’t for now.</p>
<p>One could also argue that the recession and subsequent lower growth years were due to natural structural features of our economy so a policy response wouldn’t have helped. I disagree with this argument for a few reasons discussed more <a href="https://www.nytimes.com/2018/09/30/opinion/the-economic-future-isnt-what-it-used-to-be-wonkish.html">here</a> and <a href="https://voxeu.org/article/europes-fiscal-policy-doom-loop">here</a>. Mainly, I find it implausible that structural factors would manifest themselves immediately during a financial crisis rather than steadily over time. For example, here are the projected vs. actual GDP numbers for the US and Eurozone for the past few years:</p>
<figure style="text-align:center">
<a href="/images/convergence/usprojections.png"><img src="/images/convergence/usprojections.png" /></a>
<figcaption>US projected vs. actual GDP, <a href="https://www.cbpp.org/research/full-employment/real-time-estimates-of-potential-gdp-should-the-fed-really-be-hitting-the">source</a>.</figcaption>
</figure>
<figure style="text-align:center">
<a href="/images/convergence/euprojections.png"><img src="/images/convergence/euprojections.png" /></a>
<figcaption>Eurozone projected vs. actual GDP, <a href="https://voxeu.org/article/europes-fiscal-policy-doom-loop">source</a>.</figcaption>
</figure>
<p>So the financial crisis seems to have had an immediate and lasting effect, and we keep underperforming the projections even after the effects of the recession have been taken into account. Many economists seem to think that the post financial crisis stimulus was <a href="http://www.igmchicago.org/surveys/economic-stimulus-revisited">net beneficial</a>, and <a href="http://www.igmchicago.org/surveys/infrastructure">more</a> could have been done by spending on things like infrastructure, which needs to be <a href="http://larrysummers.com/2016/09/12/building-the-case-for-greater-infrastructure-investment/">done eventually anyways</a>. So the GDP performance and beta plots above could look significantly different if we had a better policy response to the crisis.</p>
<p>But if we start excluding data on the basis of policy incompetence, there’s plenty of that to go around, so we might be left without any data to study in the end. So I guess even bad policy responses need to be considered part of the economy eventually. Maybe western countries just aren’t able to enact reforms in some situations because of the political consequences, so the poor performance above should be considered the new normal for the time being.</p>
<h2 id="recession-or-acceleration">Recession or Acceleration?</h2>
<p>So looking at the full dataset, negative beta values could be caused by a few things:</p>
<ol>
<li>Slower growth in rich countries.</li>
<li>Higher growth in poor countries.</li>
<li>A combination of both.</li>
</ol>
<p>So what’s the main driver? Below, I bin the earlier scatterplot by initial GDP to show that #3 (a combination of both) seems to be the answer, although the main driver seems to be higher growth in low and middle income countries.</p>
<figure style="text-align:center">
<a href="/images/convergence/output_24_3.png"><img src="/images/convergence/output_24_3.png" /></a>
</figure>
<p>I actually think this plot is more informative than the beta plot above because I’m more interested in absolute rather than relative levels of growth. This is because the <a href="https://ourworldindata.org/life-expectancy#life-expectancy-and-gdp">largest marginal improvements</a> to health and wellbeing probably come at early stages of growth due to improvements in things like infrastructure and public health systems, regardless of the GDP relative to other countries. So it’s great that low and middle income countries are growing more quickly!</p>
<h2 id="the-road-ahead">The Road Ahead</h2>
<p>Finally, I thought it would be interesting to include a plot of per capita GDP over time for a few key countries:</p>
<figure style="text-align:center">
<a href="/images/convergence/output_10_3.png"><img src="/images/convergence/output_10_3.png" /></a>
</figure>
<p>This plot highlights the differences between economies and the large gap lower income countries need to make up. Dietrich Vollrath made this point in a <a href="https://growthecon.com/blog/Convergence/">post</a> where he estimated it could take close to 190 years for economies to converge at the beta rates estimated above. That’s a long time to sustain the current levels of growth.</p>
<p>There are reasons to be concerned that a few economies, especially in Sub-Saharan Africa, will have a tough time maintaining that growth. Dani Rodrik and others have made this point in their work on <a href="https://rodrik.typepad.com/dani_rodriks_weblog/2015/02/premature-deindustrialization-in-the-developing-world.html">premature deindustrialization</a>, which suggests a traditional development model via structural transformation might not be possible anymore. So these countries probably need to <a href="https://www.youtube.com/watch?v=xsAjHzAGZDU">invent new ways</a> of growing their economies that aren’t so dependent on manufacturing, or <a href="https://jrc.princeton.edu/news/industrial-policies-and-production-networks">actively promote</a> certain upstream industries through policy. But even with the challenges ahead, the recent growth in lower income countries is encouraging.</p>
<h2 id="references">References</h2>
<p>[1] Convergence (economics). Wikipedia. <a href="https://en.wikipedia.org/wiki/Convergence_(economics)">https://en.wikipedia.org/wiki/Convergence_(economics)</a></p>
<p>[2] Solow-Swan Model. Wikipedia. <a href="https://en.wikipedia.org/wiki/Solow%E2%80%93Swan_model">https://en.wikipedia.org/wiki/Solow%E2%80%93Swan_model</a></p>
<p>[3] Everything You Know about Cross-Country Convergence Is (Now) Wrong. Dev Patel, Justin Sandefur and Arvind Subramanian. <a href="https://www.cgdev.org/blog/everything-you-know-about-cross-country-convergence-now-wrong">https://www.cgdev.org/blog/everything-you-know-about-cross-country-convergence-now-wrong</a></p>
<p>[4] The Economic Future Isn’t What It Used to Be. Paul Krugman. <a href="https://www.nytimes.com/2018/09/30/opinion/the-economic-future-isnt-what-it-used-to-be-wonkish.html">https://www.nytimes.com/2018/09/30/opinion/the-economic-future-isnt-what-it-used-to-be-wonkish.html</a></p>
<p>[5] Real-Time Estimates of Potential GDP: Should the Fed Really Be Hitting the Brakes? <a href="https://www.cbpp.org/research/full-employment/real-time-estimates-of-potential-gdp-should-the-fed-really-be-hitting-the">https://www.cbpp.org/research/full-employment/real-time-estimates-of-potential-gdp-should-the-fed-really-be-hitting-the</a></p>
<p>[6] Self-fulfilling pessimism: The fiscal policy doom loop. Antonio Fatás. <a href="https://voxeu.org/article/europes-fiscal-policy-doom-loop">https://voxeu.org/article/europes-fiscal-policy-doom-loop</a></p>
<p>[7] Economic Stimulus Revisited. IGM Economic Experts Panel. <a href="http://www.igmchicago.org/surveys/economic-stimulus-revisited">http://www.igmchicago.org/surveys/economic-stimulus-revisited</a></p>
<p>[8] Infrastructure. IGM Economic Experts Panel. <a href="http://www.igmchicago.org/surveys/infrastructure">http://www.igmchicago.org/surveys/infrastructure</a></p>
<p>[9] Building the case for greater infrastructure investment. Larry Summers. <a href="http://larrysummers.com/2016/09/12/building-the-case-for-greater-infrastructure-investment/">http://larrysummers.com/2016/09/12/building-the-case-for-greater-infrastructure-investment/</a></p>
<p>[10] New evidence on convergence. Dietrich Vollrath. <a href="https://growthecon.com/blog/Convergence/">https://growthecon.com/blog/Convergence/</a></p>
<p>[11] Premature deindustrialization in the developing world. Dani Rodrik. <a href="https://rodrik.typepad.com/dani_rodriks_weblog/2015/02/premature-deindustrialization-in-the-developing-world.html">https://rodrik.typepad.com/dani_rodriks_weblog/2015/02/premature-deindustrialization-in-the-developing-world.html</a></p>
<p>[12] Are emerging economies deindustrializing too quickly? <a href="https://ourworldindata.org/growth-and-structural-transformation-are-emerging-economies-industrializing-too-quickly">https://ourworldindata.org/growth-and-structural-transformation-are-emerging-economies-industrializing-too-quickly</a></p>
<p>[13] Premature Deindustrialization. Tyler Cowen. <a href="https://www.youtube.com/watch?v=xsAjHzAGZDU">https://www.youtube.com/watch?v=xsAjHzAGZDU</a></p>
<p>[14] Industrial policies and production networks. Ernest Liu. <a href="https://jrc.princeton.edu/news/industrial-policies-and-production-networks">https://jrc.princeton.edu/news/industrial-policies-and-production-networks</a></p>
<p><a href="https://pstblog.com/2018/11/26/convergence">Economic Convergence and the Great Recession</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on November 26, 2018.</p>
https://pstblog.com/2018/08/15/network-model2018-08-15T00:00:00+00:002018-08-15T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>In a <a href="https://pstblog.com/2017/07/28/mental-model">previous post</a>, I built a simple model of the world where the interventions didn’t influence each other. But in some systems (e.g. economic, political, ecological), the high number of connections means that the indirect effects of our actions might be more important than the direct effects. In these contexts, it might make more sense to use a network of nodes and edges to represent interventions rather than a traditional model. I recently came across an <a href="http://effective-altruism.com/ea/1h6/causal_networks_model_i_introduction_user_guide/">interesting post</a> on this topic, so I decided to build a few visualizations to communicate the ideas.</p>
<h2 id="a-simple-network">A Simple Network</h2>
<p>Your goal in the network below is to get the highest score by clicking a single node. The node size is proportional to the direct impact while the edges represent secondary effects. When you mouseover a node, the outgoing connections show up as solid lines while the incoming connections are dashed.</p>
<iframe id="network" style="width: 100%; height:600px; border: none; position: relative; scrolling:no;">
</iframe>
<p>Hopefully you found the best option, which might be a little surprising given its small direct effect. In this case, the strong connections with the rest of the nodes in the network result in a high indirect score.</p>
<h2 id="a-complicated-network">A Complicated Network</h2>
<p>Next, here’s a visualization I made based on the network described in the original <a href="http://effective-altruism.com/ea/1h6/causal_networks_model_i_introduction_user_guide/">forum post</a>, which evaluates potential focus areas for the effective altruism movement. When you click a node with a dark outline below, it’s the equivalent of giving one million dollars to that cause area. The effect of the money then cascades through the network based on the differentials and elasticities defined by the authors. In the end, red nodes represent beneficial changes and blue nodes harmful ones. The final impact, in <a href="https://en.wikipedia.org/wiki/Quality-adjusted_life_year">quality adjusted life years</a>, is reported at the top.</p>
<iframe id="ea-network" style="width: 100%; height:700px; border: none; position: relative; scrolling:no;">
</iframe>
<p>I’m not sure I agree with every aspect of this model, but I think it provides a good example of how a complicated network operates. In this case, although spending on money on outreach for movement growth is the most connected node (Node 8), the total impact is smaller than improving policies related to <a href="https://en.wikipedia.org/wiki/Global_catastrophic_risk">global catastrophic risks</a> (Node 6). Another interesting takeaway is that interventions can be harmful in one domain and beneficial in another. For example, donating money to <a href="https://www.givedirectly.org/">GiveDirectly</a> improves the lives of the beneficiaries through increased consumption, but this consumption might also lead to a slight increases in carbon emissions that could partially offset the benefits.</p>
<p>The main challenge with this model seems to be choosing accurate values for the elasticities and differentials, which is difficult because they need to be derived from experimental data or expert judgement. One potential fix would be to use probability distributions and <a href="https://en.wikipedia.org/wiki/Monte_Carlo_method">monte carlo</a> methods rather than point estimates. Although this won’t really solve the underlying problem, it would at least convey some level of uncertainty.</p>
<p>Another challenge is that it’s hard to represent nonlinear relationships using differentials and elasticities. For example, my first inclination is to add a fluctuating growth rate to the model to see how it affects the long term GDP. But this is difficult because representing a <a href="https://en.wikipedia.org/wiki/Compound_interest#Periodic_compounding">compound interest</a> formula like <code class="language-plaintext highlighter-rouge">P' = P(1+r)^t</code> isn’t possible using elasticities and differentials (as far as I can tell). So maybe using a <a href="https://en.wikipedia.org/wiki/System_dynamics">system dynamics</a> model that can handle non-linear relationships would be a better approach. There seem to be a number of options for these models in Python [<a href="https://github.com/JamesPHoughton/pysd">1</a>, <a href="https://github.com/jdherman/stockflow">2</a>], as well as a pretty cool <a href="https://insightmaker.com/">in-browser option</a> that’s <a href="https://github.com/scottfr/insightmaker">open source</a>.</p>
<h2 id="places-to-intervene-in-a-system">Places to Intervene in a System</h2>
<p>Finally, here’s a quote from an interesting <a href="http://donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/">article</a> discussing the best ways to influence a system:</p>
<blockquote>
<p>Places to Intervene in a System (in increasing order of effectiveness):<br />
9: Constants, parameters, numbers (subsidies, taxes, standards).<br />
8: Regulating negative feedback loops.<br />
7: Driving positive feedback loops.<br />
6: Material flows and nodes of material intersection.<br />
5: Information flows.<br />
4: The rules of the system (incentives, punishments, constraints).<br />
3: The distribution of power over the rules of the system.<br />
2: The goals of the system.<br />
1: The mindset or paradigm out of which the system — its goals, power structure, rules, its culture — arises.</p>
</blockquote>
<p>Many of the best options in the network above score well on this list. For example, global catastrophic risk (GCR) strategy is so effective because it increases concern about GCRs within academia and government. So it’s influential through a combination of step #2 (changing the goals of the system) and step #1 (changing the mindset that the system arises from).</p>
<p>What this list doesn’t consider is that it’s often more difficult to intervene at the higher levels of a system – influencing goals, mindsets and power distributions seems hard. But the model does account for this because areas that are harder to influence will have smaller incoming differentials and elasticities.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I’m not sure how useful these models are for practical decision making but I think they’re a good reminder to consider indirect effects when trying to influence a complex system. I also think they’re useful for getting people to state their assumptions about the world in a way that’s open to critique. In the future, I hope to make my own model incorporating areas like scientific research, other types of policy advocacy, and economic policy, so stay tuned.</p>
<h2 id="appendix-a-how-it-works">Appendix A: How it Works</h2>
<p>The network above is considered a <a href="http://mathworld.wolfram.com/CyclicGraph.html">cyclic graph</a>, which means there can be feedback loops between the nodes. The only requirement is there aren’t any <em>positive</em> feedback loops, which would prevent the infinite series from converging. The details of the math are covered more in the <a href="http://effective-altruism.com/ea/1h9/test/">technical guide</a>, but this is the basic equation to find a solution, where <code class="language-plaintext highlighter-rouge">I</code> is an identity matrix, <code class="language-plaintext highlighter-rouge">M</code> is the matrix of effects, and <code class="language-plaintext highlighter-rouge">V</code> is the vector of inputs:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="nx">Results</span> <span class="o">=</span> <span class="nx">V</span> <span class="o">*</span> <span class="p">(</span><span class="nx">I</span> <span class="o">-</span> <span class="nx">M</span><span class="p">)</span><span class="o">^-</span><span class="mi">1</span></code></pre></figure>
<p>In the code, I use <a href="http://mathjs.org/docs/datatypes/matrices.html">math.js</a> to do the matrix algebra, using a function similar to this:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="kd">function</span> <span class="nx">solve</span><span class="p">(</span><span class="nx">arr</span><span class="p">,</span> <span class="nx">matrix</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">len</span> <span class="o">=</span> <span class="nx">arr</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">ident</span> <span class="o">=</span> <span class="nx">math</span><span class="p">.</span><span class="nx">identity</span><span class="p">(</span><span class="nx">len</span><span class="p">,</span> <span class="nx">len</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">res</span> <span class="o">=</span> <span class="nx">math</span><span class="p">.</span><span class="nx">multiply</span><span class="p">(</span><span class="nx">arr</span><span class="p">,</span> <span class="nx">math</span><span class="p">.</span><span class="nx">inv</span><span class="p">(</span><span class="nx">math</span><span class="p">.</span><span class="nx">subtract</span><span class="p">(</span><span class="nx">ident</span><span class="p">,</span> <span class="nx">matrix</span><span class="p">))).</span><span class="nx">toArray</span><span class="p">();</span>
<span class="k">return</span> <span class="nx">res</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h2 id="references">References</h2>
<p>[1] <em>Causal Networks Model I: Introduction & User Guide.</em> Denise Melchin. <a href="http://effective-altruism.com/ea/1h6/causal_networks_model_i_introduction_user_guide/">http://effective-altruism.com/ea/1h6/causal_networks_model_i_introduction_user_guide/</a></p>
<p>[2] <em>Causal Networks Model II: Technical Guide.</em> Alex Barry. <a href="http://effective-altruism.com/ea/1h9/test/">http://effective-altruism.com/ea/1h9/test/</a></p>
<p>[3] <em>Causality in complex interventions.</em> Dean Rickles. <a href="https://link.springer.com/article/10.1007%2Fs11019-008-9140-4">https://link.springer.com/article/10.1007%2Fs11019-008-9140-4</a></p>
<p>[4] <em>math.js.</em> <a href="http://mathjs.org/">http://mathjs.org/</a></p>
<p>[5] <em>Leverage Points: Places to Intervene in a System.</em> Donella Meadows. <a href="http://donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/">http://donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/</a></p>
<script src="https://d3js.org/d3.v4.min.js"></script>
<script>
d3.request("https://raw.githubusercontent.com/psthomas/mental-model/master/causal-network/network.html")
.get(function(a) {
document.getElementById("network").srcdoc = a.response;
});
d3.request("https://raw.githubusercontent.com/psthomas/mental-model/master/causal-network/ea-network.html")
.get(function(a) {
document.getElementById("ea-network").srcdoc = a.response;
});
</script>
<p><a href="https://pstblog.com/2018/08/15/network-model">Network Models for Thinking About the World</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on August 15, 2018.</p>
https://pstblog.com/2018/06/27/meta-returns2018-06-27T00:00:00+00:002018-06-27T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>If you’re trying to improve the world, should you avoid uncertainty or embrace it? Is it better to spend money on a temporary health intervention or fund research to eventually find a cure? I tried to answer these questions in a <a href="https://pstblog.com/2017/12/02/risk-return">previous post</a> by looking at standalone data from a variety of sources. Some of the sources shared similar enough units that they could be combined, so I try to do so below.</p>
<figure style="text-align:center;">
<a href="/images/metareturns/output_18_0.png">
<img style="max-width:600px;" src="/images/metareturns/lognormal-returns.png" /></a>
</figure>
<!--<iframe id="vis" src="/vis/meta-returns.html"-->
<!-- style="width: 100%; height:500px; border: none; position: relative; scrolling:no;">-->
<!--</iframe>-->
<p>All the code and data for this project are available on GitHub <a href="https://github.com/psthomas/risk-return">here</a>.</p>
<h2 id="data-wrangling">Data Wrangling</h2>
<p>Because both the GiveWell and Future of Humanity Institute (FHI) data share the same units, I can combine them to get a sense of the scale. Also, if I lazily assume that I can estimate the standard deviation of the Disease Control Priorities (DCP2) and National Institute for Health and Care Excellence (NICE) estimates by dividing the <a href="https://stats.stackexchange.com/questions/69575/relationship-between-the-range-and-the-standard-deviation">range of the estimates by two</a>, those can be included as well. Note also that the NICE estimates are median results while Givewell and FHI are mean results, so this might need to be changed. There are more in-depth descriptions of the sources for these data in my <a href="https://pstblog.com/2017/12/02/risk-return">previous post</a>.</p>
<p>All the other conversions are pretty straightforward, but the <a href="http://healtheconomics.tuftsmedicalcenter.org/ghcearegistry/">Global Health Cost Effectiveness Analysis Registry</a> (GHCEA) data required more wrangling. I back-calculated the standard deviations from the confidence interval widths using the figures defined on Wolfram <a href="https://mathworld.wolfram.com/ConfidenceInterval.html">here</a>, assuming the confidence intervals follow a normal distribution. In addition to the calculations above, I filter out any GHCEA studies that were rated below 4.5 by reviewers on a 1-7 quality scale. The end result is 653 interventions with standard deviations and cost effectiveness estimates.</p>
<p>One final thing to consider: the Givewell numbers are estimates of the <a href="https://en.wikipedia.org/wiki/Marginal_value">marginal</a> impact of donating to existing charities, while many of the other sources measure impact against a counterfactual using an <a href="https://en.wikipedia.org/wiki/Incremental_cost-effectiveness_ratio">Incremental Cost Effectiveness Ratio</a> without a clear avenue for donors to make that change happen. Instead, many of these interventions probably need to be implemented at the hospital, insurer, or government policy level rather than through a charity (although a charity could lobby for these changes). So the GiveWell numbers might be more rigorous because they’re estimates of the current scaled-up impact of an intervention, while some of the other estimates might be too optimistic because they come from experiments implemented under ideal circumstances at a certain point in time.</p>
<h2 id="putting-it-all-together">Putting It All Together</h2>
<p>Below is a table of the combined 761 estimates and some histograms to get an idea of their distributions (mostly lognormal). Next, I fit some curves, run the portfolio optimization, and visualize the results.</p>
<div>
<table>
<thead>
<tr style="text-align: right;">
<th></th>
<th>intervention</th>
<th>cost_effectiveness</th>
<th>stdev</th>
<th>source</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>Research - Diarrhoeal diseases</td>
<td>3549.783221</td>
<td>80857.231707</td>
<td>FHI</td>
</tr>
<tr>
<th>1</th>
<td>Research - Meningititis</td>
<td>1732.527683</td>
<td>41075.262076</td>
<td>FHI</td>
</tr>
<tr>
<th>2</th>
<td>Research - Leishmaniasis</td>
<td>1703.723172</td>
<td>8363.477589</td>
<td>FHI</td>
</tr>
<tr>
<th>3</th>
<td>Routine measles-containing vaccine followed by...</td>
<td>1000.000000</td>
<td>728.877849</td>
<td>GHCEA</td>
</tr>
<tr>
<th>4</th>
<td>Aspirin alone (325 mg initial dose & subsequen...</td>
<td>1000.000000</td>
<td>1645.853207</td>
<td>GHCEA</td>
</tr>
<tr>
<th>5</th>
<td>Preventive treatment of malaria in pregnancy w...</td>
<td>1000.000000</td>
<td>13.391457</td>
<td>GHCEA</td>
</tr>
<tr>
<th>6</th>
<td>Preventive treatment of malaria in pregnancy w...</td>
<td>1000.000000</td>
<td>334.566881</td>
<td>GHCEA</td>
</tr>
<tr>
<th>7</th>
<td>New tuberculosis vaccine (40% efficacy)</td>
<td>1000.000000</td>
<td>944.841656</td>
<td>GHCEA</td>
</tr>
<tr>
<th>8</th>
<td>Research - Leprosy</td>
<td>824.521890</td>
<td>71250.250787</td>
<td>FHI</td>
</tr>
<tr>
<th>9</th>
<td>Research - Trypanosomiasis</td>
<td>802.785665</td>
<td>110366.393439</td>
<td>FHI</td>
</tr>
<tr>
<th>10</th>
<td>Research - Malaria</td>
<td>801.655755</td>
<td>7066.975304</td>
<td>FHI</td>
</tr>
<tr>
<th>11</th>
<td>Research - Multiple salmonella infections</td>
<td>753.616047</td>
<td>16466.171159</td>
<td>FHI</td>
</tr>
<tr>
<th>12</th>
<td>Research - Typhoid and paratyphoid fever</td>
<td>709.092672</td>
<td>30713.464549</td>
<td>FHI</td>
</tr>
<tr>
<th>13</th>
<td>Research - Chagas disease</td>
<td>534.967344</td>
<td>6210.949437</td>
<td>FHI</td>
</tr>
<tr>
<th>14</th>
<td>Syphilis screening before third trimester + tr...</td>
<td>500.000000</td>
<td>303.979086</td>
<td>GHCEA</td>
</tr>
<tr>
<th>15</th>
<td>Brief smoking cessation advice + Bupropion</td>
<td>333.333333</td>
<td>0.058923</td>
<td>GHCEA</td>
</tr>
<tr>
<th>16</th>
<td>Syphilis screening before third trimester + tr...</td>
<td>333.333333</td>
<td>204.085798</td>
<td>GHCEA</td>
</tr>
<tr>
<th>17</th>
<td>Intermittent preventive treatment in infants (...</td>
<td>333.333333</td>
<td>255.746614</td>
<td>GHCEA</td>
</tr>
<tr>
<th>18</th>
<td>Syphilis screening before third trimester + tr...</td>
<td>333.333333</td>
<td>173.702335</td>
<td>GHCEA</td>
</tr>
<tr>
<th>19</th>
<td>Intermittent preventive treatment in infants (...</td>
<td>333.333333</td>
<td>330.235919</td>
<td>GHCEA</td>
</tr>
<tr>
<th>20</th>
<td>Research - HIV</td>
<td>303.678832</td>
<td>2132.474086</td>
<td>FHI</td>
</tr>
<tr>
<th>21</th>
<td>Research - Trichuriasis</td>
<td>251.336051</td>
<td>13978.148011</td>
<td>FHI</td>
</tr>
<tr>
<th>22</th>
<td>New tuberculosis vaccine (40% efficacy)</td>
<td>250.000000</td>
<td>132.695577</td>
<td>GHCEA</td>
</tr>
<tr>
<th>23</th>
<td>Syphilis screening before third trimester + tr...</td>
<td>250.000000</td>
<td>127.553624</td>
<td>GHCEA</td>
</tr>
<tr>
<th>24</th>
<td>Syphilis screening before third trimester + tr...</td>
<td>250.000000</td>
<td>151.989543</td>
<td>GHCEA</td>
</tr>
</tbody>
</table>
</div>
<p>A histogram of all the results together:</p>
<figure style="text-align:center;">
<a href="/images/metareturns/hist.png"><img src="/images/metareturns/hist.png" /></a>
</figure>
<p>And here are histograms of the individual sources:</p>
<figure style="text-align:center;">
<a href="/images/metareturns/facet-hists.png"><img src="/images/metareturns/facet-hists.png" /></a>
</figure>
<h2 id="fitting-some-curves">Fitting Some Curves</h2>
<p>So how do I determine if there is a <strong>return to risk taking</strong>? One approach would be to run a linear regression through the data and see if it has a positive slope. This is what I do first below, but there’s a problem with this approach. To see why, imagine calculating the cost effectiveness of every possible action, including bogus things like lighting $1000 on fire. You’d end up with a lot of useless interventions that would mess up the slope of the linear regression.</p>
<p>So my second approach is to just see if the frontier that encloses the top end of the estimates has a positive slope. In Modern Portfolio Theory, this frontier is called the <a href="https://en.wikipedia.org/wiki/Efficient_frontier">efficient frontier</a>, which I’ve written about <a href="https://github.com/psthomas/efficient-frontier">before</a>. I didn’t have enough data to test out this theory in the past, but the combination of all these sources makes it possible to do so now.</p>
<figure style="text-align:center">
<a href="/images/metareturns/frontier.jpg"><img src="/images/metareturns/frontier.jpg" /></a>
<figcaption>An example of an efficient frontier.</figcaption>
</figure>
<p>Below, I fit a linear regression and a power law to the results. The power law doesn’t have an r-squared value because this <a href="http://blog.minitab.com/blog/adventures-in-statistics-2/why-is-there-no-r-squared-for-nonlinear-regression">isn’t really</a> a valid measure of goodness of fit for nonlinear curves. The first plot uses standard axes to get a sense of the scale:</p>
<figure style="text-align:center;">
<a href="/images/metareturns/linear-returns.png"><img src="/images/metareturns/linear-returns.png" /></a>
</figure>
<p>Here’s the same plot with log-log axes to get a better view the data:</p>
<figure style="text-align:center;">
<a href="/images/metareturns/lognormal-returns.png"><img src="/images/metareturns/lognormal-returns.png" /></a>
</figure>
<h2 id="the-efficient-frontier">The Efficient Frontier</h2>
<p>Finally, I use a modified version of an algorithm <a href="https://blog.quantopian.com/markowitz-portfolio-optimization-2/">described on Quantopian</a> to generate an efficient frontier. Each point along the curve represents a portfolio of interventions with the highest expected impact for the level of risk. The covariance matrix I used as input is all zeros except for the variances, although this could be changed if you have some reason to think intervention outcomes are correlated in some way.</p>
<p>Note that the plot below is interactive with tooltips and scroll-to-zoom enabled.</p>
<iframe id="vis" src="/vis/meta-returns.html" style="width: 100%; height:550px; border: none; position: relative; scrolling:no;">
</iframe>
<h2 id="conclusion">Conclusion</h2>
<ul>
<li>It seems like there are returns to risk taking for both the individual and combined estimates. This is useful to know because it means a a large error bound on a cost effectiveness estimate shouldn’t be disqualifying on it’s own.</li>
<li>Plots like these could be useful for identifying promising interventions, especially when many independent estimates point in the same direction. This seems to be the case for many malaria, HIV, and smoking cessation interventions in the plot above.</li>
<li>This framework could also provide a useful sanity check for future estimates. For example, if an estimate is far above the existing frontier, it might be worth reviewing it for an incorrect calculation or poor assumption. But it’s important to be careful when doing this because these estimates only cover a small fraction of the possible actions one could take in the world.</li>
<li>The intervention with the highest expected impact (and highest uncertainty) is research into diarrheal disease. This suggests that research can be very beneficial even if it’s more uncertain. This relationship might be even more clear if we were to add estimates from more esoteric forms of basic research, although some forms of research might not be amenable to this type of analysis.</li>
</ul>
<h2 id="references">References</h2>
<p>[1] <em>Are there returns to risk taking in science, philanthropy, or public policy?</em> <a href="https://pstblog.com/2017/12/02/risk-return">https://pstblog.com/2017/12/02/risk-return</a></p>
<p>[2] <em>Efficient Frontier.</em> Wikipedia. <a href="https://en.wikipedia.org/wiki/Efficient_frontier">https://en.wikipedia.org/wiki/Efficient_frontier</a></p>
<p>[3] <em>psthomas: efficient-frontier</em>. GitHub. <a href="https://github.com/psthomas/efficient-frontier">https://github.com/psthomas/efficient-frontier</a></p>
<p>[4] <em>Global Health Cost Effectiveness Analysis Registry.</em> Tufts University. <a href="http://healtheconomics.tuftsmedicalcenter.org/ghcearegistry/">http://healtheconomics.tuftsmedicalcenter.org/ghcearegistry/</a></p>
<p>[5] <em>Confidence Interval</em>. Wolfram. <a href="https://mathworld.wolfram.com/ConfidenceInterval.html">https://mathworld.wolfram.com/ConfidenceInterval.html</a></p>
<p>[6] <em>Relationship between the range and the standard deviation.</em> Stack Exchange.
<a href="https://stats.stackexchange.com/questions/69575/relationship-between-the-range-and-the-standard-deviation">https://stats.stackexchange.com/questions/69575/relationship-between-the-range-and-the-standard-deviation</a></p>
<p>[7] <em>The Efficient Frontier: Markowitz portfolio optimization in Python.</em> Quantopian.<br />
<a href="https://blog.quantopian.com/markowitz-portfolio-optimization-2/">https://blog.quantopian.com/markowitz-portfolio-optimization-2/</a></p>
<p>[8] <em>Why Is There No R-Squared for Nonlinear Regression?</em> <a href="http://blog.minitab.com/blog/adventures-in-statistics-2/why-is-there-no-r-squared-for-nonlinear-regression">http://blog.minitab.com/blog/adventures-in-statistics-2/why-is-there-no-r-squared-for-nonlinear-regression</a></p>
<p><a href="https://pstblog.com/2018/06/27/meta-returns">The Social Returns to Risk Taking</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on June 27, 2018.</p>
https://pstblog.com/2018/05/08/voting-power2018-05-08T00:00:00+00:002018-05-08T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>One interesting feature of the electoral college is that some states have more electoral votes per person than others. This, combined with the fact that we have swing states, means the importance of a vote varies considerably by location.</p>
<figure style="text-align:center">
<a href="/images/votepower/vpi2016.png"><img src="/images/votepower/vpi2016.png" /></a>
<figcaption>The Voting Power Index by county for the 2016 Presidential Election.</figcaption>
</figure>
<p>Andrew Gelman has done <a href="http://andrewgelman.com/2016/11/07/chance-vote-will-decide-election/">some</a> <a href="https://pkremp.github.io/pr_decisive_vote.html">work</a> on this subject by calculating the probability a voter will swing a presidential election. Below, I ask a different but related question: “Given an election turned out the way it did, how valuable was an additional vote in each state?” This alternate metric, called the Voting Power Index (VPI), is discussed more at the DailyKos <a href="https://www.dailykos.com/stories/2016/12/19/1612252/-Voter-Power-Index-Just-How-Much-Does-the-Electoral-College-Distort-the-Value-of-Your-Vote">here</a>. Rather than rely on predicted probabilities of electoral outcomes, this metric simply divides the state’s electoral votes by the realized vote margin:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">state_vpi = state_electoral_votes/(dem_voters - rep_voters)</code></li>
</ul>
<p>I decided to take this metric a step further and calculate a county VPI, which is the fraction of the state’s voting power that resides with the voting age population of each county:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">county_vpi = state_vpi*(county_vap/state_vap)</code></li>
</ul>
<p>These numbers can then provide some insight on the ongoing <a href="https://twitter.com/Nate_Cohn/status/972608738631868416">persuasion vs. turnout debate</a> because the county voting power can be further disaggregated by voting status:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">persuasion_vpi = county_vpi*(voting_vap/county_vap)</code></li>
<li><code class="language-plaintext highlighter-rouge">turnout_vpi = county_vpi*(nonvoting_vap/county_vap)</code></li>
</ul>
<p>This <code class="language-plaintext highlighter-rouge">turnout_vpi</code> value could hopefully act as an adjustment on the numbers from <a href="https://www.nytimes.com/2018/03/10/opinion/sunday/obama-trump-voters-democrats.html">The Missing Obama Millions</a> article, which didn’t take the electoral college into account. To be honest, I’m mainly using this metric because it’s straightforward to calculate and easy to understand. I haven’t fully considered all the implications, but it seems to produce results that are similar to other analyses [5]. I’m definietly open to critique from others in order to “kick the tires” of this metric.</p>
<p>These data were compiled for a <a href="/2017/06/05/national-election-vis">previous visualization</a>, and are available along with all the code <a href="https://github.com/psthomas/voting-power">here</a>.</p>
<h2 id="visualizing-the-county-data">Visualizing the County Data</h2>
<p>These plots show that the VPI follows a lognormal or power law distribution, with some years like 2012 having fewer outlying values. A power law distribution wouldn’t be too surprising because the values are derived from county population values, which follow something close to a power law themselves.</p>
<figure>
<a href="/images/votepower/output_10_1.png"><img src="/images/votepower/output_10_1.png" /></a>
<figcaption>Histograms of the VPI by year, log adjusted.</figcaption>
</figure>
<figure>
<a href="/images/votepower/output_12_3.png"><img src="/images/votepower/output_12_3.png" /></a>
<figcaption>Maps of the VPI by county, 2004-2016.</figcaption>
</figure>
<h2 id="a-closer-look-at-2016">A Closer Look at 2016</h2>
<p>Below, I select out just the 2016 values for a more in-depth look. One important point to make is that the VPI only shows counties that were important given the way the election turned out. It’s entirely possible that a different set of counties would show up if campaigns focused their resources elsewhere, depending on how powerful you think campaigns are. So this 2016 data is more useful for providing a picture of what happened, rather than saying what should have been done instead. The averages I look at later can provide more of a general picture of places that tend to be important.</p>
<h3 id="top-2016-counties">Top 2016 Counties</h3>
<p>The following table just shows the counties sorted by top VPI in 2016:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">county2016_df</span> <span class="o">=</span> <span class="n">county_df</span><span class="p">[</span><span class="n">county_df</span><span class="p">[</span><span class="s">'year'</span><span class="p">]</span> <span class="o">==</span> <span class="mi">2016</span><span class="p">]</span>
<span class="n">cols_2016</span> <span class="o">=</span> <span class="p">[</span><span class="s">'county_name'</span><span class="p">,</span><span class="s">'state'</span><span class="p">,</span> <span class="s">'year'</span><span class="p">,</span> <span class="s">'dem_margin'</span><span class="p">,</span>
<span class="s">'turnout'</span><span class="p">,</span> <span class="s">'turnout_vpi'</span><span class="p">,</span> <span class="s">'persuasion_vpi'</span><span class="p">,</span> <span class="s">'county_vpi'</span><span class="p">,]</span>
<span class="n">county2016_df</span><span class="p">[</span><span class="n">cols_2016</span><span class="p">].</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="s">'county_vpi'</span><span class="p">,</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">).</span><span class="n">head</span><span class="p">(</span><span class="mi">30</span><span class="p">)</span></code></pre></figure>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>county_name</th>
<th>state</th>
<th>year</th>
<th>dem_margin</th>
<th>turnout</th>
<th>turnout_vpi</th>
<th>persuasion_vpi</th>
<th>county_vpi</th>
</tr>
</thead>
<tbody>
<tr>
<th>4847</th>
<td>Hillsborough County</td>
<td>NH</td>
<td>2016</td>
<td>-0.0020</td>
<td>0.6545</td>
<td>763.202291</td>
<td>1446.068221</td>
<td>2209.270512</td>
</tr>
<tr>
<th>4849</th>
<td>Rockingham County</td>
<td>NH</td>
<td>2016</td>
<td>-0.0583</td>
<td>0.7383</td>
<td>435.852891</td>
<td>1229.390599</td>
<td>1665.243490</td>
</tr>
<tr>
<th>4390</th>
<td>Wayne County</td>
<td>MI</td>
<td>2016</td>
<td>0.3734</td>
<td>0.5837</td>
<td>539.384492</td>
<td>756.163891</td>
<td>1295.548383</td>
</tr>
<tr>
<th>4371</th>
<td>Oakland County</td>
<td>MI</td>
<td>2016</td>
<td>0.0811</td>
<td>0.6801</td>
<td>303.963423</td>
<td>646.094828</td>
<td>950.058250</td>
</tr>
<tr>
<th>4848</th>
<td>Merrimack County</td>
<td>NH</td>
<td>2016</td>
<td>0.0308</td>
<td>0.6848</td>
<td>259.213077</td>
<td>563.095587</td>
<td>822.308664</td>
</tr>
<tr>
<th>4826</th>
<td>Clark County</td>
<td>NV</td>
<td>2016</td>
<td>0.1096</td>
<td>0.4527</td>
<td>443.659050</td>
<td>366.924796</td>
<td>810.583846</td>
</tr>
<tr>
<th>4850</th>
<td>Strafford County</td>
<td>NH</td>
<td>2016</td>
<td>0.0856</td>
<td>0.6590</td>
<td>241.359028</td>
<td>466.455912</td>
<td>707.814940</td>
</tr>
<tr>
<th>4358</th>
<td>Macomb County</td>
<td>MI</td>
<td>2016</td>
<td>-0.1153</td>
<td>0.6148</td>
<td>255.384842</td>
<td>407.628058</td>
<td>663.012901</td>
</tr>
<tr>
<th>4846</th>
<td>Grafton County</td>
<td>NH</td>
<td>2016</td>
<td>0.1896</td>
<td>0.6738</td>
<td>166.484551</td>
<td>343.872287</td>
<td>510.356838</td>
</tr>
<tr>
<th>4349</th>
<td>Kent County</td>
<td>MI</td>
<td>2016</td>
<td>-0.0308</td>
<td>0.6372</td>
<td>170.557276</td>
<td>299.596590</td>
<td>470.153866</td>
</tr>
<tr>
<th>4844</th>
<td>Cheshire County</td>
<td>NH</td>
<td>2016</td>
<td>0.1262</td>
<td>0.6652</td>
<td>142.008996</td>
<td>282.158481</td>
<td>424.167478</td>
</tr>
<tr>
<th>3183</th>
<td>Maricopa County</td>
<td>AZ</td>
<td>2016</td>
<td>-0.0289</td>
<td>0.4787</td>
<td>191.532380</td>
<td>175.913839</td>
<td>367.446219</td>
</tr>
<tr>
<th>6165</th>
<td>Milwaukee County</td>
<td>WI</td>
<td>2016</td>
<td>0.3701</td>
<td>0.6069</td>
<td>144.012269</td>
<td>222.310555</td>
<td>366.322824</td>
</tr>
<tr>
<th>4842</th>
<td>Belknap County</td>
<td>NH</td>
<td>2016</td>
<td>-0.1678</td>
<td>0.7013</td>
<td>100.996831</td>
<td>237.125381</td>
<td>338.122212</td>
</tr>
<tr>
<th>4333</th>
<td>Genesee County</td>
<td>MI</td>
<td>2016</td>
<td>0.0946</td>
<td>0.6238</td>
<td>115.104725</td>
<td>190.826300</td>
<td>305.931025</td>
</tr>
<tr>
<th>4389</th>
<td>Washtenaw County</td>
<td>MI</td>
<td>2016</td>
<td>0.4128</td>
<td>0.6460</td>
<td>100.443951</td>
<td>183.323358</td>
<td>283.767309</td>
</tr>
<tr>
<th>5372</th>
<td>Philadelphia County</td>
<td>PA</td>
<td>2016</td>
<td>0.6698</td>
<td>0.5792</td>
<td>115.593496</td>
<td>159.124457</td>
<td>274.717953</td>
</tr>
<tr>
<th>4843</th>
<td>Carroll County</td>
<td>NH</td>
<td>2016</td>
<td>-0.0566</td>
<td>0.7359</td>
<td>71.663218</td>
<td>199.688143</td>
<td>271.351361</td>
</tr>
<tr>
<th>4418</th>
<td>Hennepin County</td>
<td>MN</td>
<td>2016</td>
<td>0.3493</td>
<td>0.7074</td>
<td>74.660455</td>
<td>180.535170</td>
<td>255.195626</td>
</tr>
</tbody>
</table>
</div>
<p>Here’s an interactive map showing VPI by county in 2016:</p>
<div style="width:100%;"><div style="position:relative;width:100%;height:0;padding-bottom:60%;"><iframe src="/vis/vpi_folium-2016.html" style="position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;" allowfullscreen="" webkitallowfullscreen="" mozallowfullscreen=""></iframe></div></div>
<h3 id="which-counties-were-crucial-in-states-democrats-lost">Which counties were crucial in states Democrats lost?</h3>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1">#Important counties for Democrats, in states they lost:
</span><span class="n">county2016_df</span><span class="p">[(</span><span class="n">county2016_df</span><span class="p">[</span><span class="s">'state_dem_margin'</span><span class="p">]</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)]</span> \
<span class="p">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="s">'county_vpi'</span><span class="p">,</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">)[</span><span class="n">cols_2016</span><span class="p">].</span><span class="n">head</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span></code></pre></figure>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>county_name</th>
<th>state</th>
<th>year</th>
<th>dem_margin</th>
<th>turnout</th>
<th>turnout_vpi</th>
<th>persuasion_vpi</th>
<th>county_vpi</th>
</tr>
</thead>
<tbody>
<tr>
<th>4390</th>
<td>Wayne County</td>
<td>MI</td>
<td>2016</td>
<td>0.3734</td>
<td>0.5837</td>
<td>539.384492</td>
<td>756.163891</td>
<td>1295.548383</td>
</tr>
<tr>
<th>4371</th>
<td>Oakland County</td>
<td>MI</td>
<td>2016</td>
<td>0.0811</td>
<td>0.6801</td>
<td>303.963423</td>
<td>646.094828</td>
<td>950.058250</td>
</tr>
<tr>
<th>4358</th>
<td>Macomb County</td>
<td>MI</td>
<td>2016</td>
<td>-0.1153</td>
<td>0.6148</td>
<td>255.384842</td>
<td>407.628058</td>
<td>663.012901</td>
</tr>
<tr>
<th>4349</th>
<td>Kent County</td>
<td>MI</td>
<td>2016</td>
<td>-0.0308</td>
<td>0.6372</td>
<td>170.557276</td>
<td>299.596590</td>
<td>470.153866</td>
</tr>
<tr>
<th>3183</th>
<td>Maricopa County</td>
<td>AZ</td>
<td>2016</td>
<td>-0.0289</td>
<td>0.4787</td>
<td>191.532380</td>
<td>175.913839</td>
<td>367.446219</td>
</tr>
<tr>
<th>6165</th>
<td>Milwaukee County</td>
<td>WI</td>
<td>2016</td>
<td>0.3701</td>
<td>0.6069</td>
<td>144.012269</td>
<td>222.310555</td>
<td>366.322824</td>
</tr>
<tr>
<th>4333</th>
<td>Genesee County</td>
<td>MI</td>
<td>2016</td>
<td>0.0946</td>
<td>0.6238</td>
<td>115.104725</td>
<td>190.826300</td>
<td>305.931025</td>
</tr>
<tr>
<th>4389</th>
<td>Washtenaw County</td>
<td>MI</td>
<td>2016</td>
<td>0.4128</td>
<td>0.6460</td>
<td>100.443951</td>
<td>183.323358</td>
<td>283.767309</td>
</tr>
<tr>
<th>5372</th>
<td>Philadelphia County</td>
<td>PA</td>
<td>2016</td>
<td>0.6698</td>
<td>0.5792</td>
<td>115.593496</td>
<td>159.124457</td>
<td>274.717953</td>
</tr>
<tr>
<th>4341</th>
<td>Ingham County</td>
<td>MI</td>
<td>2016</td>
<td>0.2687</td>
<td>0.5697</td>
<td>96.279322</td>
<td>127.483898</td>
<td>223.763221</td>
</tr>
<tr>
<th>5323</th>
<td>Allegheny County</td>
<td>PA</td>
<td>2016</td>
<td>0.1645</td>
<td>0.6601</td>
<td>75.873263</td>
<td>147.367799</td>
<td>223.241062</td>
</tr>
<tr>
<th>6137</th>
<td>Dane County</td>
<td>WI</td>
<td>2016</td>
<td>0.4717</td>
<td>0.7387</td>
<td>55.366688</td>
<td>156.548600</td>
<td>211.915288</td>
</tr>
<tr>
<th>4378</th>
<td>Ottawa County</td>
<td>MI</td>
<td>2016</td>
<td>-0.3047</td>
<td>0.6671</td>
<td>69.241319</td>
<td>138.756781</td>
<td>207.998100</td>
</tr>
<tr>
<th>4347</th>
<td>Kalamazoo County</td>
<td>MI</td>
<td>2016</td>
<td>0.1276</td>
<td>0.6177</td>
<td>75.981134</td>
<td>122.779735</td>
<td>198.760869</td>
</tr>
<tr>
<th>3441</th>
<td>Miami-Dade County</td>
<td>FL</td>
<td>2016</td>
<td>0.2960</td>
<td>0.4532</td>
<td>92.740626</td>
<td>76.856905</td>
<td>169.597531</td>
</tr>
</tbody>
</table>
</div>
<h3 id="where-was-more-turnout-especially-important-for-democrats">Where was more turnout especially important for Democrats?</h3>
<p>The <code class="language-plaintext highlighter-rouge">partisanturnout_vpi</code> multiplies the democratic margin for each county times the fraction of county’s voting power that resided with non-voters, in states that Democrats lost.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1">#Adjusting turnout for party margin
</span><span class="n">county_df</span><span class="p">[</span><span class="s">'partisanturnout_vpi'</span><span class="p">]</span> <span class="o">=</span> <span class="n">county_df</span><span class="p">[</span><span class="s">'dem_margin'</span><span class="p">]</span><span class="o">*</span><span class="n">county_df</span><span class="p">[</span><span class="s">'turnout_vpi'</span><span class="p">]</span>
<span class="n">county2016_df</span><span class="p">[(</span><span class="n">county2016_df</span><span class="p">[</span><span class="s">'state_dem_margin'</span><span class="p">]</span> <span class="o"><</span> <span class="mi">0</span><span class="p">)]</span> \
<span class="p">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="s">'partisanturnout_vpi'</span><span class="p">,</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> \
<span class="p">.</span><span class="n">head</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span></code></pre></figure>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>county_name</th>
<th>state</th>
<th>year</th>
<th>dem_margin</th>
<th>turnout</th>
<th>turnout_vpi</th>
<th>persuasion_vpi</th>
<th>county_vpi</th>
<th>partisanturnout_vpi</th>
</tr>
</thead>
<tbody>
<tr>
<th>4390</th>
<td>Wayne County</td>
<td>MI</td>
<td>2016</td>
<td>0.3734</td>
<td>0.5837</td>
<td>539.384492</td>
<td>756.163891</td>
<td>1295.548383</td>
<td>201.406169</td>
</tr>
<tr>
<th>5372</th>
<td>Philadelphia County</td>
<td>PA</td>
<td>2016</td>
<td>0.6698</td>
<td>0.5792</td>
<td>115.593496</td>
<td>159.124457</td>
<td>274.717953</td>
<td>77.424524</td>
</tr>
<tr>
<th>6165</th>
<td>Milwaukee County</td>
<td>WI</td>
<td>2016</td>
<td>0.3701</td>
<td>0.6069</td>
<td>144.012269</td>
<td>222.310555</td>
<td>366.322824</td>
<td>53.298941</td>
</tr>
<tr>
<th>4389</th>
<td>Washtenaw County</td>
<td>MI</td>
<td>2016</td>
<td>0.4128</td>
<td>0.6460</td>
<td>100.443951</td>
<td>183.323358</td>
<td>283.767309</td>
<td>41.463263</td>
</tr>
<tr>
<th>3441</th>
<td>Miami-Dade County</td>
<td>FL</td>
<td>2016</td>
<td>0.2960</td>
<td>0.4532</td>
<td>92.740626</td>
<td>76.856905</td>
<td>169.597531</td>
<td>27.451225</td>
</tr>
<tr>
<th>6137</th>
<td>Dane County</td>
<td>WI</td>
<td>2016</td>
<td>0.4717</td>
<td>0.7387</td>
<td>55.366688</td>
<td>156.548600</td>
<td>211.915288</td>
<td>26.116467</td>
</tr>
<tr>
<th>4341</th>
<td>Ingham County</td>
<td>MI</td>
<td>2016</td>
<td>0.2687</td>
<td>0.5697</td>
<td>96.279322</td>
<td>127.483898</td>
<td>223.763221</td>
<td>25.870254</td>
</tr>
<tr>
<th>4371</th>
<td>Oakland County</td>
<td>MI</td>
<td>2016</td>
<td>0.0811</td>
<td>0.6801</td>
<td>303.963423</td>
<td>646.094828</td>
<td>950.058250</td>
<td>24.651434</td>
</tr>
<tr>
<th>3404</th>
<td>Broward County</td>
<td>FL</td>
<td>2016</td>
<td>0.3514</td>
<td>0.5507</td>
<td>53.225032</td>
<td>65.232522</td>
<td>118.457554</td>
<td>18.703276</td>
</tr>
<tr>
<th>5323</th>
<td>Allegheny County</td>
<td>PA</td>
<td>2016</td>
<td>0.1645</td>
<td>0.6601</td>
<td>75.873263</td>
<td>147.367799</td>
<td>223.241062</td>
<td>12.481152</td>
</tr>
<tr>
<th>4333</th>
<td>Genesee County</td>
<td>MI</td>
<td>2016</td>
<td>0.0946</td>
<td>0.6238</td>
<td>115.104725</td>
<td>190.826300</td>
<td>305.931025</td>
<td>10.888907</td>
</tr>
<tr>
<th>5367</th>
<td>Montgomery County</td>
<td>PA</td>
<td>2016</td>
<td>0.2128</td>
<td>0.6807</td>
<td>46.131019</td>
<td>98.363149</td>
<td>144.494168</td>
<td>9.816681</td>
</tr>
<tr>
<th>4347</th>
<td>Kalamazoo County</td>
<td>MI</td>
<td>2016</td>
<td>0.1276</td>
<td>0.6177</td>
<td>75.981134</td>
<td>122.779735</td>
<td>198.760869</td>
<td>9.695193</td>
</tr>
</tbody>
</table>
</div>
<h1 id="county-averages-2004-2016">County Averages, 2004-2016</h1>
<p>The above 2016 analysis is interesting but if we want values that are more generalizable to future elections it makes sense to look at averages. If you average over too few elections you risk overfitting to a specific point in time, but averaging over too many years will make the results irrelevant to the present. I was only able to compile data from 2004 forward anyways, so these are the years I went with.</p>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>county_name</th>
<th>state</th>
<th>dem_margin</th>
<th>turnout</th>
<th>turnout_vpi</th>
<th>persuasion_vpi</th>
<th>county_vpi</th>
</tr>
</thead>
<tbody>
<tr>
<th>1901</th>
<td>Bernalillo County</td>
<td>NM</td>
<td>0.149150</td>
<td>0.574300</td>
<td>322.191780</td>
<td>458.901725</td>
<td>781.093505</td>
</tr>
<tr>
<th>1875</th>
<td>Hillsborough County</td>
<td>NH</td>
<td>0.004500</td>
<td>0.673875</td>
<td>261.531237</td>
<td>509.284778</td>
<td>770.816015</td>
</tr>
<tr>
<th>1466</th>
<td>St. Louis County County</td>
<td>MO</td>
<td>0.147975</td>
<td>0.713775</td>
<td>146.267445</td>
<td>475.652972</td>
<td>621.920417</td>
</tr>
<tr>
<th>1877</th>
<td>Rockingham County</td>
<td>NH</td>
<td>-0.033725</td>
<td>0.736700</td>
<td>154.517507</td>
<td>427.941955</td>
<td>582.459462</td>
</tr>
<tr>
<th>1935</th>
<td>Clark County</td>
<td>NV</td>
<td>0.123775</td>
<td>0.491850</td>
<td>274.776186</td>
<td>247.379861</td>
<td>522.156047</td>
</tr>
<tr>
<th>1418</th>
<td>Jackson County</td>
<td>MO</td>
<td>0.198300</td>
<td>0.625425</td>
<td>135.126185</td>
<td>287.882299</td>
<td>423.008484</td>
</tr>
<tr>
<th>1282</th>
<td>Wayne County</td>
<td>MI</td>
<td>0.432750</td>
<td>0.613300</td>
<td>148.977818</td>
<td>212.273072</td>
<td>361.250891</td>
</tr>
<tr>
<th>1876</th>
<td>Merrimack County</td>
<td>NH</td>
<td>0.086950</td>
<td>0.690625</td>
<td>91.522184</td>
<td>198.694873</td>
<td>290.217058</td>
</tr>
<tr>
<th>3004</th>
<td>Milwaukee County</td>
<td>WI</td>
<td>0.333100</td>
<td>0.681100</td>
<td>89.948122</td>
<td>193.970429</td>
<td>283.918551</td>
</tr>
<tr>
<th>1263</th>
<td>Oakland County</td>
<td>MI</td>
<td>0.077700</td>
<td>0.722425</td>
<td>82.209028</td>
<td>179.036279</td>
<td>261.245307</td>
</tr>
<tr>
<th>1878</th>
<td>Strafford County</td>
<td>NH</td>
<td>0.138775</td>
<td>0.653925</td>
<td>85.666172</td>
<td>161.759279</td>
<td>247.425451</td>
</tr>
<tr>
<th>1485</th>
<td>St. Louis City County</td>
<td>MO</td>
<td>0.648300</td>
<td>0.562800</td>
<td>87.590113</td>
<td>134.671075</td>
<td>222.261188</td>
</tr>
<tr>
<th>1908</th>
<td>Dona Ana County</td>
<td>NM</td>
<td>0.134450</td>
<td>0.487375</td>
<td>106.915843</td>
<td>111.046813</td>
<td>217.962656</td>
</tr>
<tr>
<th>1462</th>
<td>St. Charles County</td>
<td>MO</td>
<td>-0.186850</td>
<td>0.691450</td>
<td>51.804907</td>
<td>160.086089</td>
<td>211.890996</td>
</tr>
<tr>
<th>149</th>
<td>Maricopa County</td>
<td>AZ</td>
<td>-0.096975</td>
<td>0.504300</td>
<td>102.560581</td>
<td>101.472468</td>
<td>204.033050</td>
</tr>
<tr>
<th>1250</th>
<td>Macomb County</td>
<td>MI</td>
<td>-0.001075</td>
<td>0.646625</td>
<td>69.591618</td>
<td>112.881508</td>
<td>182.473126</td>
</tr>
</tbody>
</table>
</div>
<p>Here’s a map of the averages:</p>
<div style="width:100%;"><div style="position:relative;width:100%;height:0;padding-bottom:60%;"><iframe src="/vis/vpi_folium-2004-2016.html" style="position:absolute;width:100%;height:100%;left:0;top:0;border:none !important;" allowfullscreen="" webkitallowfullscreen="" mozallowfullscreen=""></iframe></div></div>
<h3 id="which-counties-are-important-for-democrats-on-average">Which counties are important for Democrats, on average?</h3>
<p>Next, I repeat much of the same analysis as for 2016 using these average values now. The following table shows counties that are crucial for Democrats in states they’re often within one standard deviation of losing:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1">#Using stdev as cutoff:
</span><span class="n">avg_df</span><span class="p">[(</span><span class="n">avg_df</span><span class="p">[</span><span class="s">'state_dem_margin'</span><span class="p">]</span> <span class="o">-</span> <span class="n">avg_df</span><span class="p">[</span><span class="s">'state_dem_margin_std'</span><span class="p">])</span> <span class="o"><</span> <span class="mi">0</span><span class="p">]</span> \
<span class="p">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="s">'county_vpi'</span><span class="p">,</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> \
<span class="p">[</span><span class="n">avg_cols</span><span class="p">].</span><span class="n">head</span><span class="p">(</span><span class="mi">30</span><span class="p">)</span></code></pre></figure>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>county_name</th>
<th>state</th>
<th>dem_margin</th>
<th>turnout</th>
<th>turnout_vpi</th>
<th>persuasion_vpi</th>
<th>county_vpi</th>
</tr>
</thead>
<tbody>
<tr>
<th>1875</th>
<td>Hillsborough County</td>
<td>NH</td>
<td>0.004500</td>
<td>0.673875</td>
<td>261.531237</td>
<td>509.284778</td>
<td>770.816015</td>
</tr>
<tr>
<th>1466</th>
<td>St. Louis County County</td>
<td>MO</td>
<td>0.147975</td>
<td>0.713775</td>
<td>146.267445</td>
<td>475.652972</td>
<td>621.920417</td>
</tr>
<tr>
<th>1877</th>
<td>Rockingham County</td>
<td>NH</td>
<td>-0.033725</td>
<td>0.736700</td>
<td>154.517507</td>
<td>427.941955</td>
<td>582.459462</td>
</tr>
<tr>
<th>1935</th>
<td>Clark County</td>
<td>NV</td>
<td>0.123775</td>
<td>0.491850</td>
<td>274.776186</td>
<td>247.379861</td>
<td>522.156047</td>
</tr>
<tr>
<th>1418</th>
<td>Jackson County</td>
<td>MO</td>
<td>0.198300</td>
<td>0.625425</td>
<td>135.126185</td>
<td>287.882299</td>
<td>423.008484</td>
</tr>
<tr>
<th>1282</th>
<td>Wayne County</td>
<td>MI</td>
<td>0.432750</td>
<td>0.613300</td>
<td>148.977818</td>
<td>212.273072</td>
<td>361.250891</td>
</tr>
<tr>
<th>1876</th>
<td>Merrimack County</td>
<td>NH</td>
<td>0.086950</td>
<td>0.690625</td>
<td>91.522184</td>
<td>198.694873</td>
<td>290.217058</td>
</tr>
<tr>
<th>3004</th>
<td>Milwaukee County</td>
<td>WI</td>
<td>0.333100</td>
<td>0.681100</td>
<td>89.948122</td>
<td>193.970429</td>
<td>283.918551</td>
</tr>
<tr>
<th>1263</th>
<td>Oakland County</td>
<td>MI</td>
<td>0.077700</td>
<td>0.722425</td>
<td>82.209028</td>
<td>179.036279</td>
<td>261.245307</td>
</tr>
<tr>
<th>1878</th>
<td>Strafford County</td>
<td>NH</td>
<td>0.138775</td>
<td>0.653925</td>
<td>85.666172</td>
<td>161.759279</td>
<td>247.425451</td>
</tr>
<tr>
<th>1485</th>
<td>St. Louis City County</td>
<td>MO</td>
<td>0.648300</td>
<td>0.562800</td>
<td>87.590113</td>
<td>134.671075</td>
<td>222.261188</td>
</tr>
<tr>
<th>1462</th>
<td>St. Charles County</td>
<td>MO</td>
<td>-0.186850</td>
<td>0.691450</td>
<td>51.804907</td>
<td>160.086089</td>
<td>211.890996</td>
</tr>
<tr>
<th>149</th>
<td>Maricopa County</td>
<td>AZ</td>
<td>-0.096975</td>
<td>0.504300</td>
<td>102.560581</td>
<td>101.472468</td>
<td>204.033050</td>
</tr>
<tr>
<th>1250</th>
<td>Macomb County</td>
<td>MI</td>
<td>-0.001075</td>
<td>0.646625</td>
<td>69.591618</td>
<td>112.881508</td>
<td>182.473126</td>
</tr>
<tr>
<th>1874</th>
<td>Grafton County</td>
<td>NH</td>
<td>0.207425</td>
<td>0.695250</td>
<td>56.945682</td>
<td>121.479588</td>
<td>178.425270</td>
</tr>
<tr>
<th>1409</th>
<td>Greene County</td>
<td>MO</td>
<td>-0.230925</td>
<td>0.603550</td>
<td>58.499603</td>
<td>115.185390</td>
<td>173.684993</td>
</tr>
<tr>
<th>2976</th>
<td>Dane County</td>
<td>WI</td>
<td>0.426900</td>
<td>0.766800</td>
<td>37.693819</td>
<td>118.256630</td>
<td>155.950449</td>
</tr>
<tr>
<th>1683</th>
<td>Mecklenburg County</td>
<td>NC</td>
<td>0.199550</td>
<td>0.623500</td>
<td>46.769953</td>
<td>105.776832</td>
<td>152.546785</td>
</tr>
</tbody>
</table>
</div>
<h3 id="where-is-turnout-important-for-democrats-on-average">Where is turnout important for Democrats, on average?</h3>
<p>These are counties that have a high <code class="language-plaintext highlighter-rouge">partisanturnout_vpi</code>, which is calculated by multiplying the Democratic margin in a county times the <code class="language-plaintext highlighter-rouge">turnout_vpi</code>, then filtering by states Democrats are within one standard deviation of losing on average.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1">#Using standard deviation as cutoff:
</span><span class="n">avg_df</span><span class="p">[(</span><span class="n">avg_df</span><span class="p">[</span><span class="s">'state_dem_margin'</span><span class="p">]</span> <span class="o">-</span> <span class="n">avg_df</span><span class="p">[</span><span class="s">'state_dem_margin_std'</span><span class="p">])</span> <span class="o"><</span> <span class="mi">0</span><span class="p">]</span> \
<span class="p">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="s">'partisanturnout_vpi'</span><span class="p">,</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> \
<span class="p">[</span><span class="n">temp_cols</span><span class="p">].</span><span class="n">head</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span></code></pre></figure>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>county_name</th>
<th>state</th>
<th>dem_margin</th>
<th>turnout</th>
<th>turnout_vpi</th>
<th>persuasion_vpi</th>
<th>county_vpi</th>
<th>partisanturnout_vpi</th>
</tr>
</thead>
<tbody>
<tr>
<th>1485</th>
<td>St. Louis City County</td>
<td>MO</td>
<td>0.648300</td>
<td>0.562800</td>
<td>87.590113</td>
<td>134.671075</td>
<td>222.261188</td>
<td>59.512773</td>
</tr>
<tr>
<th>1282</th>
<td>Wayne County</td>
<td>MI</td>
<td>0.432750</td>
<td>0.613300</td>
<td>148.977818</td>
<td>212.273072</td>
<td>361.250891</td>
<td>56.343820</td>
</tr>
<tr>
<th>1418</th>
<td>Jackson County</td>
<td>MO</td>
<td>0.198300</td>
<td>0.625425</td>
<td>135.126185</td>
<td>287.882299</td>
<td>423.008484</td>
<td>33.846646</td>
</tr>
<tr>
<th>1466</th>
<td>St. Louis County County</td>
<td>MO</td>
<td>0.147975</td>
<td>0.713775</td>
<td>146.267445</td>
<td>475.652972</td>
<td>621.920417</td>
<td>28.579127</td>
</tr>
<tr>
<th>2264</th>
<td>Philadelphia County</td>
<td>PA</td>
<td>0.665325</td>
<td>0.609925</td>
<td>42.629915</td>
<td>61.745824</td>
<td>104.375739</td>
<td>28.265886</td>
</tr>
<tr>
<th>3004</th>
<td>Milwaukee County</td>
<td>WI</td>
<td>0.333100</td>
<td>0.681100</td>
<td>89.948122</td>
<td>193.970429</td>
<td>283.918551</td>
<td>26.973063</td>
</tr>
<tr>
<th>1935</th>
<td>Clark County</td>
<td>NV</td>
<td>0.123775</td>
<td>0.491850</td>
<td>274.776186</td>
<td>247.379861</td>
<td>522.156047</td>
<td>26.547873</td>
</tr>
<tr>
<th>333</th>
<td>Miami-Dade County</td>
<td>FL</td>
<td>0.189300</td>
<td>0.530975</td>
<td>70.069633</td>
<td>62.784885</td>
<td>132.854518</td>
<td>16.912560</td>
</tr>
<tr>
<th>2976</th>
<td>Dane County</td>
<td>WI</td>
<td>0.426900</td>
<td>0.766800</td>
<td>37.693819</td>
<td>118.256630</td>
<td>155.950449</td>
<td>14.620923</td>
</tr>
<tr>
<th>296</th>
<td>Broward County</td>
<td>FL</td>
<td>0.335900</td>
<td>0.597550</td>
<td>41.629796</td>
<td>54.008480</td>
<td>95.638276</td>
<td>14.402219</td>
</tr>
<tr>
<th>814</th>
<td>Marion County</td>
<td>IN</td>
<td>0.188175</td>
<td>0.543075</td>
<td>46.917241</td>
<td>71.846952</td>
<td>118.764192</td>
<td>12.542131</td>
</tr>
<tr>
<th>1683</th>
<td>Mecklenburg County</td>
<td>NC</td>
<td>0.199550</td>
<td>0.623500</td>
<td>46.769953</td>
<td>105.776832</td>
<td>152.546785</td>
<td>11.133419</td>
</tr>
</tbody>
</table>
</div>
<h2 id="how-well-does-the-vpi-predict-the-future">How well does the VPI predict the future?</h2>
<p>One purpose of this analysis is to use the VPI to predict the locations that will be important for future elections, so below I run a few regressions to estimate the predictive power. Note that I use the state level VPI for these regressions because the county level observations aren’t independent (they’re all derived from the state level VPI). First is a regression showing how the previous election predicts the next one:</p>
<figure>
<a href="/images/votepower/output_25_1.png"><img src="/images/votepower/output_25_1.png" /></a>
</figure>
<p>So, looking at the R^2 values, the ability of the previous election to predict the next one varies but overall seems pretty unstable. Next, I look at how the average of the last three elections predicts the 2016 election:</p>
<figure>
<a href="/images/votepower/output_26_1.png"><img src="/images/votepower/output_26_1.png" /></a>
</figure>
<p>This seems like it might do a little better job, but it’s still pretty inconsistent at the high end of VPIs which is where you’d want to use it to allocate resources. So overall the VPI seems like it might be more useful as a metric to describe the results of past elections rather than predict the future.</p>
<h2 id="turnout-vs-persuasion">Turnout vs. Persuasion</h2>
<p>Next, I sum the persuasion and turnout values for each election, and then calculate their ratio. This shows that persuasion wins out in every election, but that these ratios can vary considerably. The average pesuasion to turnout ratio is <code class="language-plaintext highlighter-rouge">1.6:1</code>, but the ratio was only <code class="language-plaintext highlighter-rouge">1.3:1</code> in 2012, a year with low turnout and few close states.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">yr_df</span> <span class="o">=</span> <span class="n">county_df</span><span class="p">.</span><span class="n">groupby</span><span class="p">([</span><span class="s">'year'</span><span class="p">])</span> \
<span class="p">.</span><span class="n">agg</span><span class="p">({</span><span class="s">'persuasion_vpi'</span><span class="p">:</span> <span class="s">'sum'</span><span class="p">,</span>
<span class="s">'turnout_vpi'</span><span class="p">:</span> <span class="s">'sum'</span><span class="p">,</span> <span class="s">'turnout_advantage'</span><span class="p">:</span><span class="s">'sum'</span><span class="p">})</span>
<span class="n">yr_df</span><span class="p">[</span><span class="s">'ratio'</span><span class="p">]</span> <span class="o">=</span> <span class="n">yr_df</span><span class="p">[</span><span class="s">'persuasion_vpi'</span><span class="p">]</span> <span class="o">/</span> <span class="n">yr_df</span><span class="p">[</span><span class="s">'turnout_vpi'</span><span class="p">]</span>
<span class="n">yr_df</span></code></pre></figure>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>turnout_advantage</th>
<th>turnout_vpi</th>
<th>persuasion_vpi</th>
<th>ratio</th>
</tr>
</thead>
<tbody>
<tr>
<th>2004</th>
<td>-7565.454805</td>
<td>11808.359012</td>
<td>19373.813817</td>
<td>1.640686</td>
</tr>
<tr>
<th>2008</th>
<td>-10075.903226</td>
<td>11373.802537</td>
<td>21449.705764</td>
<td>1.885887</td>
</tr>
<tr>
<th>2012</th>
<td>-1775.685570</td>
<td>5122.170093</td>
<td>6897.855664</td>
<td>1.346667</td>
</tr>
<tr>
<th>2016</th>
<td>-7459.669223</td>
<td>11810.178369</td>
<td>19269.847592</td>
<td>1.631631</td>
</tr>
</tbody>
</table>
</div>
<p>Averages for all elections:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">yr_df</span><span class="p">.</span><span class="n">mean</span><span class="p">()</span>
<span class="n">turnout_advantage</span> <span class="o">-</span><span class="mf">6719.178206</span>
<span class="n">turnout_vpi</span> <span class="mf">10028.627503</span>
<span class="n">persuasion_vpi</span> <span class="mf">16747.805709</span>
<span class="n">ratio</span> <span class="mf">1.626218</span></code></pre></figure>
<p>An interesting point Nate Cohn <a href="https://twitter.com/Nate_Cohn/status/972608738631868416">made</a> is that when you persuade someone, it’s usually the case that you’re switching their vote from an opponent. On the face of it, this would make persuasion doubly effective:</p>
<figure>
<a href="/images/votepower/cohnpersuasion.png"><img src="/images/votepower/cohnpersuasion.png" /></a>
</figure>
<p>But when you turn out a strong Democratic voter they’re likely to vote for every Democrat on the ballot, while persuading a Republican to vote for a particular candidate might only lead to a <a href="https://en.wikipedia.org/wiki/Split-ticket_voting">split ticket</a>. So the net political effect of increased turnout might be larger when you consider every candidate on the ballot and their political power. In addition, if voting is habit forming, the benefits of increased turnout could extend to future elections [7].</p>
<p>These results are interesting, but aren’t informative until we have data on the cost effectiveness of each approach, which probably depends on things like the population density and price of ad buys in different locations. Rather than thinking in binary terms, I think it makes sense to ask which strategy is better depending on the location. For more on this discussion, see this <a href="https://twitter.com/davidshor/status/1055918122371350530">twitter thread</a>:</p>
<figure>
<a href="/images/votepower/cost-effectiveness.png"><img src="/images/votepower/cost-effectiveness.png" /></a>
</figure>
<h2 id="conclusion">Conclusion</h2>
<ul>
<li>Voting power is lognormally or power law distributed, so some counties/voters are much more powerful than others.</li>
<li>Certain locations in NH, NM, MO, NV, MI, WI, PA, and FL regularly top the list.</li>
<li>At least according to this metric, persuasion has more power than turnout (an average 1.6:1 ratio), but the cost of each approach needs to be considered before drawing any conclusions.</li>
<li>The VPI doesn’t do a very good job of predicting future elections, so it’s probably more useful as a way to describe results from the past.</li>
</ul>
<h2 id="next-steps">Next steps</h2>
<p>This analysis only considers the power of a location in the presidential elections, but Senate, House and State Level elections are crucial too. It wouldn’t be very hard to include results from Senate elections as those are statewide, but House districts don’t match county lines and have ever-changing boundaries. As a result, it would be hard to get Voting Age Population data and distribute the voting power amongst the counties.</p>
<p>One thing I could do is create a <a href="https://github.com/UDST/synthpop">synthetic population</a> of the US using census data. I could then distribute voting power to every individual based on their location in Presidential, Congressional and State elections, then aggregate it by any boundary I choose. I’d need accurate district boundaries, results for every election, and a model of how power is shared across branches of government to make this work though.</p>
<h2 id="references">References</h2>
<p>[1] What is the chance that your vote will decide the election? Ask Stan! <a href="http://andrewgelman.com/2016/11/07/chance-vote-will-decide-election/">http://andrewgelman.com/2016/11/07/chance-vote-will-decide-election/</a></p>
<p>[2] What is the Chance That Your Vote Will Decide the Election? <a href="https://pkremp.github.io/pr_decisive_vote.html">https://pkremp.github.io/pr_decisive_vote.html</a></p>
<p>[3] Nate Cohn. <a href="https://twitter.com/Nate_Cohn/status/972608738631868416">https://twitter.com/Nate_Cohn/status/972608738631868416</a></p>
<p>[4] The Missing Obama Millions. <a href="https://www.nytimes.com/2018/03/10/opinion/sunday/obama-trump-voters-democrats.html">https://www.nytimes.com/2018/03/10/opinion/sunday/obama-trump-voters-democrats.html</a></p>
<p>[5] Voter Power Index: Just How Much Does the Electoral College Distort the Value of Your Vote? <a href="https://www.dailykos.com/stories/2016/12/19/1612252/-Voter-Power-Index-Just-How-Much-Does-the-Electoral-College-Distort-the-Value-of-Your-Vote">https://www.dailykos.com/stories/2016/12/19/1612252/-Voter-Power-Index-Just-How-Much-Does-the-Electoral-College-Distort-the-Value-of-Your-Vote</a></p>
<p>[6] Synthpop: Synthetic populations from census data. <a href="https://github.com/UDST/synthpop">https://github.com/UDST/synthpop</a></p>
<p>[7] Is Voting Habit Forming? New Evidence from Experiments and Regression Discontinuities. <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/ajps.12210">https://onlinelibrary.wiley.com/doi/abs/10.1111/ajps.12210</a></p>
<p><a href="https://pstblog.com/2018/05/08/voting-power">Voting Power, Turnout, and Persuasion in Presidential Elections</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on May 08, 2018.</p>
https://pstblog.com/2018/02/08/google-analytics2018-02-08T00:00:00+00:002018-02-08T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>When I started this blog, I automatically included <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/">Google Analytics</a> because it was convenient and useful. I’ve recently been thinking more about privacy, so I’ve been looking for alternatives. I ended up coming up with the following solution that still uses Google Analytics, but sends less user information (and loads faster).</p>
<h2 id="how-analyticsjs-works">How Analytics.js Works</h2>
<p>The most common way to use GA is to include a script like this somewhere on the page:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span><span class="nx">s</span><span class="p">,</span><span class="nx">o</span><span class="p">,</span><span class="nx">g</span><span class="p">,</span><span class="nx">r</span><span class="p">,</span><span class="nx">a</span><span class="p">,</span><span class="nx">m</span><span class="p">){</span><span class="nx">i</span><span class="p">[</span><span class="dl">'</span><span class="s1">GoogleAnalyticsObject</span><span class="dl">'</span><span class="p">]</span><span class="o">=</span><span class="nx">r</span><span class="p">;</span><span class="nx">i</span><span class="p">[</span><span class="nx">r</span><span class="p">]</span><span class="o">=</span><span class="nx">i</span><span class="p">[</span><span class="nx">r</span><span class="p">]</span><span class="o">||</span><span class="kd">function</span><span class="p">(){</span>
<span class="p">(</span><span class="nx">i</span><span class="p">[</span><span class="nx">r</span><span class="p">].</span><span class="nx">q</span><span class="o">=</span><span class="nx">i</span><span class="p">[</span><span class="nx">r</span><span class="p">].</span><span class="nx">q</span><span class="o">||</span><span class="p">[]).</span><span class="nx">push</span><span class="p">(</span><span class="nx">arguments</span><span class="p">)},</span><span class="nx">i</span><span class="p">[</span><span class="nx">r</span><span class="p">].</span><span class="nx">l</span><span class="o">=</span><span class="mi">1</span><span class="o">*</span><span class="k">new</span> <span class="nb">Date</span><span class="p">();</span><span class="nx">a</span><span class="o">=</span><span class="nx">s</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="nx">o</span><span class="p">),</span>
<span class="nx">m</span><span class="o">=</span><span class="nx">s</span><span class="p">.</span><span class="nx">getElementsByTagName</span><span class="p">(</span><span class="nx">o</span><span class="p">)[</span><span class="mi">0</span><span class="p">];</span><span class="nx">a</span><span class="p">.</span><span class="k">async</span><span class="o">=</span><span class="mi">1</span><span class="p">;</span><span class="nx">a</span><span class="p">.</span><span class="nx">src</span><span class="o">=</span><span class="nx">g</span><span class="p">;</span><span class="nx">m</span><span class="p">.</span><span class="nx">parentNode</span><span class="p">.</span><span class="nx">insertBefore</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="nx">m</span><span class="p">)</span>
<span class="p">})(</span><span class="nb">window</span><span class="p">,</span><span class="nb">document</span><span class="p">,</span><span class="dl">'</span><span class="s1">script</span><span class="dl">'</span><span class="p">,</span><span class="dl">'</span><span class="s1">https://www.google-analytics.com/analytics.js</span><span class="dl">'</span><span class="p">,</span><span class="dl">'</span><span class="s1">ga</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">ga</span><span class="p">(</span><span class="dl">'</span><span class="s1">create</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">UA-XXXXX-Y</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">auto</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">ga</span><span class="p">(</span><span class="dl">'</span><span class="s1">send</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">pageview</span><span class="dl">'</span><span class="p">);</span></code></pre></figure>
<p>This code creates a GA object and a script tag, which then loads <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference">analytics.js</a>. When the script loads, functions are called to collect information from the browser, set some cookies, and then send a pageview. This pageview <code class="language-plaintext highlighter-rouge">GET</code> request is the most important part – it’s where all the data is collected. From a privacy standpoint, another relevant part is the <code class="language-plaintext highlighter-rouge">auto</code> argument, which tells GA to <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/cookies-user-id#automatic_cookie_domain_configuration">automatically set cookies</a> to track users. One of these cookies persists for 2 years, and is meant to track users across multiple sessions.</p>
<p>Google does offer ways to increase privacy with custom settings on the GA object. For example, the following settings <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/cookies-user-id#disabling_cookies">disable cookies</a>, <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference#forceSSL">force SSL</a>, and <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/ip-anonymization">store fewer digits</a> of the IP address:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="nx">ga</span><span class="p">(</span><span class="dl">'</span><span class="s1">create</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">UA-XXXXX-Y</span><span class="dl">'</span><span class="p">,</span> <span class="p">{</span>
<span class="dl">'</span><span class="s1">storage</span><span class="dl">'</span><span class="p">:</span> <span class="dl">'</span><span class="s1">none</span><span class="dl">'</span><span class="p">,</span>
<span class="dl">'</span><span class="s1">storeGac</span><span class="dl">'</span><span class="p">:</span> <span class="kc">false</span><span class="p">});</span>
<span class="nx">ga</span><span class="p">(</span><span class="dl">"</span><span class="s2">set</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">anonymizeIp</span><span class="dl">"</span><span class="p">,</span> <span class="kc">true</span><span class="p">);</span>
<span class="nx">ga</span><span class="p">(</span><span class="dl">'</span><span class="s1">set</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">forceSSL</span><span class="dl">'</span><span class="p">,</span> <span class="kc">true</span><span class="p">);</span>
<span class="nx">ga</span><span class="p">(</span><span class="dl">'</span><span class="s1">send</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">pageview</span><span class="dl">'</span><span class="p">);</span> </code></pre></figure>
<p>The problem with the above approach is that you depend on what Google is willing to build into their API. They still control what happens behind the scenes, and can change it in the future by modifying <code class="language-plaintext highlighter-rouge">analytics.js</code>. In addition, they <a href="https://support.google.com/analytics/answer/2763052?hl=en">only anonymize</a> the last octet of the IP address (e.g. <code class="language-plaintext highlighter-rouge">12.214.31.144</code> -> <code class="language-plaintext highlighter-rouge">12.214.31.0</code>), which <a href="https://computer.howstuffworks.com/internet/basics/question5492.htm">isn’t very anonymous</a> if you’re on a corporate or university network (and still allows tracking at the city level for a large ISP).</p>
<h2 id="the-measurement-protocol">The Measurement Protocol</h2>
<p>An alternative to loading analytics.js is to manually collect whatever data you need, then send it to them using the <a href="https://developers.google.com/analytics/devguides/collection/protocol/v1/reference">measurement protocol</a>. This way you don’t have to load analytics.js, and you have more <a href="https://developers.google.com/analytics/devguides/collection/protocol/v1/parameters">fine tuned</a> control over what data is sent. Here’s an example script that sends a <code class="language-plaintext highlighter-rouge">GET</code> request (an example <code class="language-plaintext highlighter-rouge">POST</code> request is in the appendix):</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="kd">let</span> <span class="nx">ls</span> <span class="o">=</span> <span class="p">[];</span>
<span class="kd">const</span> <span class="nx">tid</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">UA-XXXXX-Y</span><span class="dl">'</span><span class="p">;</span> <span class="c1">//Your Analytics ID</span>
<span class="kd">const</span> <span class="nx">cid</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="mi">100</span><span class="o">+</span><span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">()</span><span class="o">*</span><span class="mi">900</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">fields</span> <span class="o">=</span> <span class="p">[</span><span class="dl">'</span><span class="s1">v</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">tid</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">cid</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">t</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">aip</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">uip</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">dl</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">dt</span><span class="dl">'</span><span class="p">];</span>
<span class="kd">const</span> <span class="nx">values</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="nx">tid</span><span class="p">,</span> <span class="nx">cid</span><span class="p">,</span> <span class="dl">'</span><span class="s1">pageview</span><span class="dl">'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="dl">'</span><span class="s1">0.0.0.0</span><span class="dl">'</span><span class="p">,</span> <span class="nb">window</span><span class="p">.</span><span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span> <span class="nb">document</span><span class="p">.</span><span class="nx">title</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span><span class="o"><</span><span class="nx">fields</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">ls</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nb">String</span><span class="p">(</span><span class="nx">fields</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span> <span class="o">+</span> <span class="dl">'</span><span class="s1">=</span><span class="dl">'</span> <span class="o">+</span> <span class="nb">encodeURIComponent</span><span class="p">(</span><span class="nb">String</span><span class="p">(</span><span class="nx">values</span><span class="p">[</span><span class="nx">i</span><span class="p">])));</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">url</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">https://www.google-analytics.com/r/collect?</span><span class="dl">'</span> <span class="o">+</span> <span class="nx">ls</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="dl">'</span><span class="s1">&</span><span class="dl">'</span><span class="p">);</span>
<span class="c1">// https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch</span>
<span class="nx">fetch</span><span class="p">(</span><span class="nx">url</span><span class="p">);</span></code></pre></figure>
<p>There are more details about how this works in the <a href="https://developers.google.com/analytics/devguides/collection/protocol/v1/reference">reference</a>, but the main point is I only collect the <code class="language-plaintext highlighter-rouge">page URL</code> and <code class="language-plaintext highlighter-rouge">page title</code> for each visit. The <code class="language-plaintext highlighter-rouge">uip</code> field is an override that sets the IP to <code class="language-plaintext highlighter-rouge">0.0.0.0</code>, and the <code class="language-plaintext highlighter-rouge">aip</code> field tells Google to anonymize this IP address.</p>
<p><strong>One caveat</strong>: It took me awhile to realize this, but much of the data that GA collects is actually gathered when the user’s browser performs the <code class="language-plaintext highlighter-rouge">GET</code> request, regardless of the data that’s attached to the request. This is how GA gets the user’s IP address and User Agent information, so in the end this approach comes down to trusting that Google actually overrides and anonymizes the IP address in their database. So I’m not completely sold on this approach yet, but it seems like an improvement.</p>
<h2 id="other-options">Other Options</h2>
<ul>
<li>Other third party analytics providers - The problem is, they can still sell your data.</li>
<li><a href="https://www.google.com/webmasters/">Google Webmasters Console</a> - This shows you how often your site appears in searches, how often people clicked through to your site, and inbound links. The downside is it only shows traffic from Google searches.</li>
<li>Self Hosted - Options like <a href="https://matomo.org/">Matomo/Piwik</a> are nice, or you could just set up your own server and perform get requests to it whenever a site page loads. The problem is, these options are more complicated than the static site itself.</li>
<li><a href="https://aws.amazon.com/s3/">Amazon Bucket</a> - Static sites hosted in S3 buckets can be configured to store access logs. You could pull these down using the command line and analyze them locally. Unfortunately, static sites on S3 are a pain to set up.</li>
</ul>
<h2 id="appendix---post-request">Appendix - Post Request</h2>
<p>Note this uses a different URL (<code class="language-plaintext highlighter-rouge">/collect</code>), and sends the joined fields as the body of the <code class="language-plaintext highlighter-rouge">POST</code>.</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="kd">let</span> <span class="nx">ls</span> <span class="o">=</span> <span class="p">[];</span>
<span class="kd">const</span> <span class="nx">tid</span> <span class="o">=</span> <span class="dl">'</span><span class="s1">UA-XXXXX-Y</span><span class="dl">'</span><span class="p">;</span> <span class="c1">//Your Analytics ID</span>
<span class="kd">const</span> <span class="nx">cid</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">floor</span><span class="p">(</span><span class="mi">100</span><span class="o">+</span><span class="nb">Math</span><span class="p">.</span><span class="nx">random</span><span class="p">()</span><span class="o">*</span><span class="mi">900</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">fields</span> <span class="o">=</span> <span class="p">[</span><span class="dl">'</span><span class="s1">v</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">tid</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">cid</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">t</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">aip</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">uip</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">dl</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">dt</span><span class="dl">'</span><span class="p">];</span>
<span class="kd">const</span> <span class="nx">values</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="nx">tid</span><span class="p">,</span> <span class="nx">cid</span><span class="p">,</span> <span class="dl">'</span><span class="s1">pageview</span><span class="dl">'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="dl">'</span><span class="s1">0.0.0.0</span><span class="dl">'</span><span class="p">,</span> <span class="nb">window</span><span class="p">.</span><span class="nx">location</span><span class="p">.</span><span class="nx">href</span><span class="p">,</span> <span class="nb">document</span><span class="p">.</span><span class="nx">title</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span><span class="o"><</span><span class="nx">fields</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">ls</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nb">String</span><span class="p">(</span><span class="nx">fields</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span> <span class="o">+</span> <span class="dl">'</span><span class="s1">=</span><span class="dl">'</span> <span class="o">+</span> <span class="nb">encodeURIComponent</span><span class="p">(</span><span class="nb">String</span><span class="p">(</span><span class="nx">values</span><span class="p">[</span><span class="nx">i</span><span class="p">])));</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">data</span> <span class="o">=</span> <span class="nx">ls</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="dl">'</span><span class="s1">&</span><span class="dl">'</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">request</span> <span class="o">=</span> <span class="p">{</span>
<span class="na">method</span><span class="p">:</span> <span class="dl">'</span><span class="s1">POST</span><span class="dl">'</span><span class="p">,</span>
<span class="na">headers</span><span class="p">:</span> <span class="p">{</span><span class="dl">'</span><span class="s1">Content-Type</span><span class="dl">'</span><span class="p">:</span> <span class="dl">'</span><span class="s1">application/x-www-form-urlencoded</span><span class="dl">'</span><span class="p">},</span>
<span class="na">body</span><span class="p">:</span> <span class="nx">data</span>
<span class="p">};</span>
<span class="c1">// https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch</span>
<span class="nx">fetch</span><span class="p">(</span><span class="dl">'</span><span class="s1">https://www.google-analytics.com/collect</span><span class="dl">'</span><span class="p">,</span> <span class="nx">request</span><span class="p">);</span></code></pre></figure>
<p><a href="https://pstblog.com/2018/02/08/google-analytics">Privacy With Google Analytics</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on February 08, 2018.</p>
https://pstblog.com/2018/01/30/password-manager2018-01-30T00:00:00+00:002018-01-30T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>I was recently doing some research into different password managers, and I came across an interesting <a href="http://blog.jgc.org/2010/12/write-your-passwords-down.html">blog post</a>. Apparently, the author uses an old cryptographic tool called a <a href="https://en.wikipedia.org/wiki/Tabula_recta">tabula recta</a> to generate and remember his passwords:</p>
<blockquote>
<p>Research says you need 80-bits of entropy in your password so it needs to be long, chosen from a wide range of characters and chosen randomly. My scheme gives me 104 bits of entropy. My passwords are generated using a little program I wrote that chooses random characters (using a cryptographically secure random number generator) and then printing them out on a tabula recta. . . I use that sheet as follows. If I’m logging into Amazon I’ll find the intersection of column M and column A (the second and third letters of Amazon) and then read off diagonally 16 characters. . . The security of this system rests on the randomness of the generated characters and the piece of paper.</p>
</blockquote>
<p>After reading this, I realized it might be possible to make this system a little more user friendly by automating some of the steps. I ended up making a simple webpage that generates a tabula based on a master password, which then can be used to generate site-specific passwords. Here are the steps:</p>
<ol>
<li>Enter a strong master password. This password is used as the seed for an in-browser random number generator (<a href="https://github.com/davidbau/seedrandom">seedrandom.js</a>), which will always generate the same unique grid of characters based your the master password.</li>
<li>Select a pattern to follow. Current options are <code class="language-plaintext highlighter-rouge">line</code>, <code class="language-plaintext highlighter-rouge">step</code>, <code class="language-plaintext highlighter-rouge">spiral</code>, or <code class="language-plaintext highlighter-rouge">manual</code> (if you want to to create your own pattern).</li>
<li>Click a memorable starting cell for the pattern, such as the grid location [A,N] for <strong>a</strong>mazo<strong>n</strong>.com. If you chose an automated pattern, the password will be generated after you click.</li>
<li>When you need the password in the future, repeat steps 1-3 above.</li>
</ol>
<p>Try it out below or try the <a href="https://pstblog.com/projects/tabula.html">full page</a> version. The source code is available on <a href="https://github.com/psthomas/tabula">GitHub</a>, along with a prototype Firefox extension.</p>
<p><strong>Note</strong>: This project is still experimental, so it needs more scrutiny before I’d recommend using it. If you do, print out a copy of the table so you have a backup if I change the code.</p>
<!--https://stackoverflow.com/questions/5867985-->
<div class="outer">
<div class="inner">
<iframe id="tabula" src="/projects/tabula.html" style="width: 98vw; height: 110vh; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div>
<h2 id="how-secure-is-this-system">How secure is this system?</h2>
<h4 id="web-security">Web Security</h4>
<p>This approach does sacrifice security for convenience to some extent – you’re using your computer and a web browser to create passwords rather than a piece of paper. This leaves you vulnerable to script injections or keylogging software. The master password is never sent to a server and the code doesn’t make any network requests, so some risks can be reduced by turning your internet connection off or using a local version of the HTML page. On the other hand, you don’t have to store a physical copy of your table anywhere, so it’s a little more secure in that sense.</p>
<h4 id="cryptography">Cryptography</h4>
<p><code class="language-plaintext highlighter-rouge">Hashed password -> Decrypted password</code></p>
<p>If someone gets a hashed version of one of your passwords and doesn’t know your grid, these passwords would take a very long time to crack. For example, using a password with 16 of the advanced characters has <code class="language-plaintext highlighter-rouge">log_2(88^16) = 103</code> bits of entropy.</p>
<p><code class="language-plaintext highlighter-rouge">Decrypted Password -> Reconstruction of master password and table</code></p>
<p>If someone obtained one of your site-specific passwords, it’s possible that they could re-create a copy of your grid. They would do this by trying a bunch of different master password inputs, and checking if the output contains a string that matches your site password. I don’t know how to estimate the feasibility of this attack, but choosing a strong master password could help mitigate the risk. In addition, I use <a href="https://en.wikipedia.org/wiki/Scrypt">scrypt</a> to convert the master password into a seed for the random number generator, which should make it more resource-intensive to try this attack.</p>
<p><code class="language-plaintext highlighter-rouge">Copy of Master password or grid -> Reconstruction of site-specific passwords</code></p>
<p>If someone were to steal your master password or table they’d get an ordered grid of characters. They would still need to perform a brute force attack to get a password, but it wouldn’t be very hard to do so by trying common patterns. I estimate it would take anywhere from a year to less than a second depending on whether they’re using a rate-limited web form or trying to break a hashed password from a leaked database (see the appendix).</p>
<p>Once they have one password, they could probably figure out the rest if you used the same generating pattern for all of them. It might be possible to improve this system by having the user add a salt to the password after it’s generated, but that would only add security until someone obtains a raw version of one of your passwords.</p>
<h4 id="usability">Usability</h4>
<p>The interface allows the user to hide the grid, toggle password highlighting, and hide password forms to prevent <a href="https://en.wikipedia.org/wiki/Shoulder_surfing_%28computer_security%29">shoulder surfing</a>. It also includes some automated patterns that users can choose between. While these patterns do give an attacker a template to follow during a brute force attack, I think people would probably choose more obvious patterns if they had to manually enter them. For extra security, one could select the manual option and use a unique, site-specific pattern for each password.</p>
<h2 id="how-does-this-stack-up-against-other-password-managers">How does this stack up against other password managers?</h2>
<p>It seems like my approach would be a little more secure, but less capable than cloud based versions of <a href="https://www.lastpass.com/">LastPass</a> or <a href="https://1password.com/">1password</a>. With those systems, if your master password is compromised and you don’t have two factor authentication enabled, your entire vault is compromised. With my system, someone still needs to brute force your grid of characters. But the downside is you don’t have some of the perks of those systems like storing encrypted documents, group sharing, or emergency credential release.</p>
<p>Using a tabula is probably less secure than local password managers like the (<a href="https://medium.com/@kennwhite/who-moved-my-cheese-1password-6a98a0fc6c56">disappearing</a>) desktop version of 1password, <a href="https://www.passwordstore.org/">Pass</a>, <a href="https://www.enpass.io/">Enpass</a>, <a href="https://keepass.info/">KeePass</a>, <a href="https://github.com/bndw/pick">pick</a>, or <a href="https://www.zetetic.net/codebook/">Codebook</a>. If you only use a local manager on a single device, that’s probably the most secure option (assuming you trust the developers and their code). Even if you back up your local database in the cloud with a service like DropBox, it’s probably still secure if you use strong encryption. The only real vulnerability is if your device is compromised, as the attacker could access all your passwords at once.</p>
<h2 id="summary">Summary</h2>
<p>So, to sum up, this is what my system does:</p>
<ul>
<li>Generates site-specific passwords with as many bits of entropy as you need</li>
<li>Allows users to choose between broad character-sets depending on site requirements</li>
<li>Makes it easy to remember complex passwords without storing them anywhere</li>
<li>Allows access anywhere, on any computer or smartphone you trust</li>
</ul>
<p>What it doesn’t do:</p>
<ul>
<li>Store existing passwords</li>
<li>Create passwords for sites with detailed character requirements</li>
<li>Encrypt documents</li>
<li>Protect against <a href="https://en.wikipedia.org/wiki/Keystroke_logging">keyloggers</a>, or other malware</li>
<li>Protect against phishing attempts</li>
<li>Securely share passwords</li>
<li>Incorporate two-factor authentication</li>
</ul>
<h2 id="next-steps">Next Steps</h2>
<p>I put the code up on <a href="https://github.com/psthomas/tabula">GitHub</a> so anyone can contribute and review. I’m especially interested in improving the security against the potential attacks I outlined above. Also, I want to know more about the <a href="http://davidbau.com/archives/2010/01/30/random_seeds_coded_hints_and_quintillions.html">quality</a> of the <a href="https://github.com/davidbau/seedrandom/blob/released/seedrandom.js#L144">random number</a> generator I’m using because this system depends on having a random grid of characters.</p>
<p>I also made a prototype Firefox extension that’s available in the GitHub repo, which can be loaded as a temporary extension. This might add a little security because extensions are <a href="https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Distribution">signed</a> by Mozilla before installation.</p>
<h2 id="appendix">Appendix</h2>
<p>Brute forcing a tabula recta is discussed on stackexchange <a href="https://security.stackexchange.com/questions/13579/using-a-tabula-recta-to-store-passwords/13745#13745">here</a>. Here’s a basic calculation:</p>
<p><code class="language-plaintext highlighter-rouge">(26*26 cells) * (20 common patterns) * (20 potential password lengths, 10-30chars) * (8 starting directions, including diagonals) = ~2 million options</code>.</p>
<p>So on average this would take about <code class="language-plaintext highlighter-rouge">2 million/2 = 1 million</code> guesses to crack. Using rates from <a href="https://github.com/dropbox/zxcvbn#usage">zxcvbn</a>:</p>
<p>On a rate limited site: <code class="language-plaintext highlighter-rouge">1,000,000 / (100/hour) = 10000 hrs = 416 days</code><br />
On a non-rate limited site: <code class="language-plaintext highlighter-rouge">1,000,000/(10*60*60) = 27.8 hrs</code> <br />
Offline, slow hashing algorithm: <code class="language-plaintext highlighter-rouge">1,000,000/(1e4*60) = 1.7 minutes</code><br />
Offline, fast hashing algorithm: <code class="language-plaintext highlighter-rouge">1,000,000/(1e10/second) = <<1 second</code></p>
<p><a href="https://pstblog.com/2018/01/30/password-manager">A Simple, Stateless Password Manager</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on January 30, 2018.</p>
https://pstblog.com/2017/12/02/risk-return2017-12-02T00:00:00+00:002017-12-02T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>If you’re hoping to do good in the world, it makes sense to ask where your efforts will make the biggest impact. Some have claimed that high risk, high return projects are most promising because those areas are less crowded. For example, here’s a quote from Robert Reich’s <a href="http://bostonreview.net/forum/foundations-philanthropy-democracy">essay</a> on the role of philanthropic foundations in society:</p>
<blockquote>
<p>When it comes to the ongoing work of experimentation, foundations have a structural advantage over market and state institutions: a longer time horizon. Once more, the lack of accountability may be a surprising advantage. . . foundations are not subject to earnings reports, impatient investors or stockholders, or short-term election cycles. Foundations, answerable only to the diverse preferences and ideas of their donors, with a protected endowment permitted to exist in perpetuity, may be uniquely situated to engage in the sort of high-risk, long-run policy innovation and experimentation that is healthy in a democratic society.</p>
</blockquote>
<p>The Open Philanthropy Project outlines a similar approach in <a href="https://www.openphilanthropy.org/blog/hits-based-giving">a post</a> about their giving philosophy:</p>
<blockquote>
<p>One of our core values is our tolerance for philanthropic “risk.” Our overarching goal is to do as much good as we can, and as part of that, we’re open to supporting work that has a high risk of failing to accomplish its goals. We’re even open to supporting work that is more than 90% likely to fail, as long as the overall expected value is high enough.</p>
</blockquote>
<p>It seems <a href="https://blog.givewell.org/2013/05/02/broad-market-efficiency/">intuitive</a> that there are returns to risk taking but I was wondering if there were any datasets out there that would support this idea. Below I attempt to answer this question by looking at evidence from science, philanthropy, and public policy. I just finished a <a href="https://pstblog.com/2018/06/27/meta-returns">meta-analysis</a> that combines the interventions from below that have similar units, so take a look at that as well.</p>
<h2 id="definitions">Definitions</h2>
<p>Before I continue, I think it makes sense to define the terms <strong>risk</strong> and <strong>return</strong>. By <strong>return</strong>, I mean the impact of an intervention using units like <a href="https://en.wikipedia.org/wiki/Disability-adjusted_life_year">disability adjusted life years</a> per dollar, benefit to cost ratios, or research citation counts. While some of these estimates are more complicated to construct than others, they all require making judgements about things like the value of a human life, the amount of suffering caused by different conditions, or the benefits from a highly cited paper.</p>
<p>The definition of the term <strong>risk</strong> is tricky to pin down. To some, it’s just a measure of the noisiness of an estimate and is measured using something like the <a href="https://en.wikipedia.org/wiki/Standard_deviation">standard deviation</a>. To others, an intervention is only risky when it could potentially underperform some target or cause harm (e.g. <a href="https://en.wikipedia.org/wiki/Downside_risk">downside risk</a>). People seem to use the uncertainty and downside risk terms interchangeably, and I think both are useful, so I include both in my analysis where possible. When I use the term <code class="language-plaintext highlighter-rouge">risk</code> below, I’m talking about the standard deviation of the outcome and <code class="language-plaintext highlighter-rouge">downside risk</code> refers to the underperformance of the mean.</p>
<p>Here are how the values are calculated:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">risk = standard deviation = np.std(series)</code></li>
<li><code class="language-plaintext highlighter-rouge">downside risk (semideviation) = np.sqrt((np.minimum(0.0, series - t)**2).sum()/series.size)</code>, where <code class="language-plaintext highlighter-rouge">t</code> is the mean intervention outcome</li>
</ul>
<p>Finally, how do you determine if there is a <strong>return to risk taking</strong>? One approach would be to run a linear regression through the data and see if it has a positive slope. This is what I do for most of the sources below, but there’s a problem with this approach. To see why, imagine calculating the cost effectiveness of every possible action, including bogus things like “lighting $1000 on fire”. You’d end up with a lot of useless interventions that would mess up the slope of the linear regression. So another approach would be to just see if the frontier that encloses the top end of the estimates has a positive slope. In Modern Portfolio Theory, this frontier is called the <a href="https://en.wikipedia.org/wiki/Efficient_frontier">efficient frontier</a>:</p>
<figure style="text-align:center">
<a href="/images/metareturns/frontier.jpg"><img src="/images/metareturns/frontier.jpg" /></a>
</figure>
<p>I’ve written about this idea <a href="https://github.com/psthomas/efficient-frontier">before</a> but didn’t have the data to test the concept out until now. I stick with linear regression for this notebook, but I run the data through a portfolio optimization algorithm to create an efficient frontier in my <a href="https://pstblog.com/2018/06/27/meta-returns">meta-analysis</a> blogpost.</p>
<h2 id="data-sources">Data Sources</h2>
<p>It’s pretty difficult to find datasets that quantify their uncertainty while also using a cross-intervention measure of impact, so it’s taken me awhile to stumble across enough data to complete this analysis. The <code class="language-plaintext highlighter-rouge">scrape.py</code> file included in <a href="https://github.com/psthomas/risk-return">the repo</a> for this projects outlines how I accessed and cleaned data from each source.</p>
<p>All the code and data for this post are available <a href="https://github.com/psthomas/risk-return">here</a>.</p>
<h2 id="evidence-from-public-policy">Evidence from Public Policy</h2>
<p>First, I look at a dataset from the <a href="http://www.wsipp.wa.gov/BenefitCost">Washington State Institute for Public Policy</a> (WSIPP). The WSIPP evaluates evidence based public policies and completes detailed benefit-cost analyses using monte carlo methods. The end result is a list of benefit-cost ratios along with metrics like the chance that the benefit-cost ratio is positive.</p>
<p>The measure of risk I’m using here (<code class="language-plaintext highlighter-rouge">the chance costs exceed benefits</code>) sets a really low bar. This ignores the upside of an intervention and much of the downside until the benefit cost ratio is below one. It also counts a project with a very low downside the same of one with only a marginally low downside because they’re just counting up benefit-cost ratios < 1 and <a href="http://www.wsipp.wa.gov/TechnicalDocumentation/WsippBenefitCostTechnicalDocumentation.pdf">dividing by the total number of monte carlo runs</a>. The benefit of this metric is that it is easy to interpret, but I wish they would include a standard deviation as well.</p>
<div>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>program_name</th>
<th>benefit_cost_ratio</th>
<th>chance_costs_exceed_benefits</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>Educator professional development: Use of data...</td>
<td>-174.30</td>
<td>69</td>
</tr>
<tr>
<th>1</th>
<td>Scared Straight</td>
<td>-101.25</td>
<td>98</td>
</tr>
<tr>
<th>2</th>
<td>Behavioral self-control training -BSCT</td>
<td>-80.03</td>
<td>77</td>
</tr>
<tr>
<th>3</th>
<td>Alcohol Literacy Challenge -for college students</td>
<td>-34.25</td>
<td>51</td>
</tr>
<tr>
<th>4</th>
<td>InShape</td>
<td>-29.59</td>
<td>53</td>
</tr>
<tr>
<th>5</th>
<td>Drug Abuse Resistance Education -D.A.R.E.</td>
<td>-7.71</td>
<td>51</td>
</tr>
<tr>
<th>6</th>
<td>Youth advocacy/empowerment programs for tobacc...</td>
<td>-7.13</td>
<td>64</td>
</tr>
<tr>
<th>7</th>
<td>Sex offender registration and community notifi...</td>
<td>-5.14</td>
<td>67</td>
</tr>
<tr>
<th>8</th>
<td>Interventions to prevent excessive gestational...</td>
<td>-5.03</td>
<td>64</td>
</tr>
<tr>
<th>9</th>
<td>Interventions to prevent excessive gestational...</td>
<td>-3.71</td>
<td>53</td>
</tr>
<tr>
<th>10</th>
<td>Police diversion for individuals with mental i...</td>
<td>-2.94</td>
<td>99</td>
</tr>
<tr>
<th>11</th>
<td>Treatment for juveniles convicted of sex offen...</td>
<td>-2.59</td>
<td>82</td>
</tr>
<tr>
<th>12</th>
<td>Project SUCCESS</td>
<td>-1.84</td>
<td>61</td>
</tr>
<tr>
<th>13</th>
<td>"Check-in" behavior interventions</td>
<td>-1.71</td>
<td>54</td>
</tr>
<tr>
<th>14</th>
<td>Opening Doors advising in community college</td>
<td>-1.70</td>
<td>78</td>
</tr>
<tr>
<th>15</th>
<td>Multicomponent environmental interventions to ...</td>
<td>-1.64</td>
<td>73</td>
</tr>
<tr>
<th>16</th>
<td>Inpatient or intensive outpatient drug treatme...</td>
<td>-1.51</td>
<td>66</td>
</tr>
<tr>
<th>17</th>
<td>Domestic violence perpetrator treatment -Dulut...</td>
<td>-1.50</td>
<td>77</td>
</tr>
<tr>
<th>18</th>
<td>Other Family Preservation Services -non-HOMEBU...</td>
<td>-1.40</td>
<td>100</td>
</tr>
<tr>
<th>19</th>
<td>Life skills education</td>
<td>-1.33</td>
<td>65</td>
</tr>
<tr>
<th>20</th>
<td>Intensive supervision -probation</td>
<td>-1.32</td>
<td>100</td>
</tr>
<tr>
<th>21</th>
<td>Even Start</td>
<td>-1.15</td>
<td>69</td>
</tr>
<tr>
<th>22</th>
<td>Family dependency treatment court</td>
<td>-1.11</td>
<td>93</td>
</tr>
<tr>
<th>23</th>
<td>CASASTART</td>
<td>-1.04</td>
<td>77</td>
</tr>
<tr>
<th>24</th>
<td>Cognitive behavioral therapy -CBT for children...</td>
<td>-1.01</td>
<td>92</td>
</tr>
<tr>
<th>25</th>
<td>Cognitive-behavioral coping-skills therapy for...</td>
<td>-0.99</td>
<td>58</td>
</tr>
<tr>
<th>26</th>
<td>Primary care in behavioral health settings -co...</td>
<td>-0.96</td>
<td>75</td>
</tr>
<tr>
<th>27</th>
<td>Community-based correctional facilities -halfw...</td>
<td>-0.71</td>
<td>100</td>
</tr>
<tr>
<th>28</th>
<td>Early Start -New Zealand</td>
<td>-0.49</td>
<td>98</td>
</tr>
<tr>
<th>29</th>
<td>Interventions to reduce unnecessary emergency ...</td>
<td>-0.48</td>
<td>52</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>294</th>
<td>Project EX</td>
<td>41.71</td>
<td>12</td>
</tr>
<tr>
<th>295</th>
<td>Education and Employment Training -EET King Co...</td>
<td>41.84</td>
<td>0</td>
</tr>
<tr>
<th>296</th>
<td>Seeking Safety</td>
<td>42.40</td>
<td>12</td>
</tr>
<tr>
<th>297</th>
<td>Smoking cessation programs for pregnant women:...</td>
<td>47.61</td>
<td>2</td>
</tr>
<tr>
<th>298</th>
<td>Acceptance and Commitment Therapy for adult an...</td>
<td>48.55</td>
<td>15</td>
</tr>
<tr>
<th>299</th>
<td>Cognitive behavioral therapy -CBT for adult de...</td>
<td>49.09</td>
<td>0</td>
</tr>
<tr>
<th>300</th>
<td>Cognitive behavioral therapy -CBT for adult an...</td>
<td>54.01</td>
<td>0</td>
</tr>
<tr>
<th>301</th>
<td>Anti-smoking media campaigns adult effect</td>
<td>57.07</td>
<td>13</td>
</tr>
<tr>
<th>302</th>
<td>Consultant teachers: Online coaching</td>
<td>61.94</td>
<td>8</td>
</tr>
<tr>
<th>303</th>
<td>Summer book programs: Multi-year intervention</td>
<td>63.90</td>
<td>30</td>
</tr>
<tr>
<th>304</th>
<td>Case management in schools</td>
<td>64.07</td>
<td>4</td>
</tr>
<tr>
<th>305</th>
<td>Good Behavior Game</td>
<td>65.47</td>
<td>30</td>
</tr>
<tr>
<th>306</th>
<td>Teacher performance pay programs</td>
<td>65.55</td>
<td>12</td>
</tr>
<tr>
<th>307</th>
<td>Teacher professional development: Induction/me...</td>
<td>70.72</td>
<td>36</td>
</tr>
<tr>
<th>308</th>
<td>More intensive tobacco quitlines -compared to ...</td>
<td>73.51</td>
<td>0</td>
</tr>
<tr>
<th>309</th>
<td>College advising provided by counselors -for h...</td>
<td>74.56</td>
<td>0</td>
</tr>
<tr>
<th>310</th>
<td>School-based tobacco prevention programs</td>
<td>75.10</td>
<td>1</td>
</tr>
<tr>
<th>311</th>
<td>Cognitive behavioral therapy -CBT for adult po...</td>
<td>88.11</td>
<td>0</td>
</tr>
<tr>
<th>312</th>
<td>Model Smoking Prevention Program</td>
<td>89.83</td>
<td>9</td>
</tr>
<tr>
<th>313</th>
<td>Access to tobacco quitlines</td>
<td>95.85</td>
<td>5</td>
</tr>
<tr>
<th>314</th>
<td>Teacher professional development: Use of data ...</td>
<td>122.55</td>
<td>2</td>
</tr>
<tr>
<th>315</th>
<td>Tutoring: By peers</td>
<td>133.59</td>
<td>17</td>
</tr>
<tr>
<th>316</th>
<td>Text message reminders -for high school graduates</td>
<td>135.71</td>
<td>47</td>
</tr>
<tr>
<th>317</th>
<td>Anti-smoking media campaign youth effect</td>
<td>147.33</td>
<td>0</td>
</tr>
<tr>
<th>318</th>
<td>Consultant teachers: Content-Focused Coaching</td>
<td>173.17</td>
<td>6</td>
</tr>
<tr>
<th>319</th>
<td>Summer outreach counseling -for high school gr...</td>
<td>195.39</td>
<td>10</td>
</tr>
<tr>
<th>320</th>
<td>Alcohol Literacy Challenge -for high school st...</td>
<td>259.46</td>
<td>42</td>
</tr>
<tr>
<th>321</th>
<td>Text messaging programs for smoking cessation</td>
<td>363.46</td>
<td>0</td>
</tr>
<tr>
<th>322</th>
<td>Eye Movement Desensitization and Reprocessing ...</td>
<td>598.94</td>
<td>0</td>
</tr>
<tr>
<th>323</th>
<td>Computer-based programs for smoking cessation</td>
<td>794.18</td>
<td>0</td>
</tr>
</tbody>
</table>
</div>
<p>Below is a plot of the intervention rank and the benefit-cost ratio. It’s clear that some interventions outperform others by a few orders of magnitude. Another interesting finding is that the distribution might be two tailed, with some outlying performers on the bad end as well.</p>
<figure style="text-align:center">
<a href="/images/returns/output_5_0.png"><img src="/images/returns/output_5_0.png" /></a>
</figure>
<p>Next, I plot the chance costs exceed benefits (an imperfect proxy for downside risk) against the benefit-cost ratio. I derive the <code class="language-plaintext highlighter-rouge">chance costs exceed benefits</code> from WSIPP’s <code class="language-plaintext highlighter-rouge">chance benefits exceed costs</code> value, which they calculate by counting results from their monte carlo simulations. This measure doesn’t take into account the scale of good/poor performance, but it’s the best we can get without access to their models.</p>
<p>The end result is that there doesn’t seem to be much of a return to this measure of risk.</p>
<figure style="text-align:center">
<a href="/images/returns/output_7_0.png"><img src="/images/returns/output_7_0.png" /></a>
</figure>
<h2 id="evidence-from-public-health">Evidence from Public Health</h2>
<p>The next dataset I look it is from the <a href="http://www.dcp-3.org/dcp2">Disease Control Priorities Project</a> (DCP2), which comes up with comprehensive estimates of the cost effectiveness of different treatments in developing countries. The original source is a table in the DCP2 report, which Jeff Kaufmann made into a <a href="https://www.jefftk.com/dcp2.csv">CSV</a>. I selected the interventions with <code class="language-plaintext highlighter-rouge">$/DALY</code> units, eliminated any with zero or near zero spread (because they likely came from the same estimate), and only selected the estimates from sub-saharan Africa. Finally I converted <code class="language-plaintext highlighter-rouge">$/DALY</code> units to <code class="language-plaintext highlighter-rouge">DALY/1000USD</code> so a bigger number has a higher impact.</p>
<p>Using the spread isn’t very rigorous and might bias the results towards understudied areas with few estimates (e.g. an intervention with only a single estimate has spread of 0), but it’s the only measure of uncertainty available here.</p>
<div>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>condition</th>
<th>intervention</th>
<th>cost_effectiveness</th>
<th>spread</th>
</tr>
</thead>
<tbody>
<tr>
<th>29</th>
<td>Malaria</td>
<td>Intermittent preventive treatment in pregnancy...</td>
<td>142.857143</td>
<td>111.111111</td>
</tr>
<tr>
<th>28</th>
<td>Malaria</td>
<td>Insecticidetreated bednets</td>
<td>90.909091</td>
<td>83.333333</td>
</tr>
<tr>
<th>2</th>
<td>Lymphatic filariasis</td>
<td>Annual mass drug administration</td>
<td>66.666667</td>
<td>43.478261</td>
</tr>
<tr>
<th>41</th>
<td>Malaria</td>
<td>Residual household spraying</td>
<td>58.823529</td>
<td>66.666667</td>
</tr>
<tr>
<th>30</th>
<td>Malaria</td>
<td>Intermittent preventive treatment in pregnancy...</td>
<td>52.631579</td>
<td>90.909091</td>
</tr>
<tr>
<th>27</th>
<td>Traffic accidents</td>
<td>Increased speeding penalties, enforcement, med...</td>
<td>47.619048</td>
<td>28.571429</td>
</tr>
<tr>
<th>16</th>
<td>Lymphatic filariasis</td>
<td>Diethyl carbamazine salt</td>
<td>45.454545</td>
<td>23.809524</td>
</tr>
<tr>
<th>39</th>
<td>HIV/AIDS</td>
<td>Peer and education programs for high-risk groups</td>
<td>27.027027</td>
<td>16.129032</td>
</tr>
<tr>
<th>52</th>
<td>HIV/AIDS</td>
<td>Voluntary counseling and testing</td>
<td>21.276596</td>
<td>13.333333</td>
</tr>
<tr>
<th>9</th>
<td>Tuberculosis (endemic)</td>
<td>BCG vaccine</td>
<td>14.705882</td>
<td>37.037037</td>
</tr>
<tr>
<th>6</th>
<td>Stroke (recurrent)</td>
<td>Aspirin and dipyridamole</td>
<td>12.345679</td>
<td>43.478261</td>
</tr>
<tr>
<th>14</th>
<td>HIV/AIDS</td>
<td>Condom promotion and distribution</td>
<td>12.195122</td>
<td>16.666667</td>
</tr>
<tr>
<th>10</th>
<td>HIV/AIDS</td>
<td>Blood and needle safety</td>
<td>11.904762</td>
<td>18.518519</td>
</tr>
<tr>
<th>18</th>
<td>Tuberculosis (epidemic, infectious)</td>
<td>Directly observed short-course chemotherapy</td>
<td>9.803922</td>
<td>5.747126</td>
</tr>
<tr>
<th>43</th>
<td>Emergency medical care</td>
<td>Staffed community ambulance</td>
<td>8.333333</td>
<td>8.403361</td>
</tr>
<tr>
<th>49</th>
<td>HIV/AIDS</td>
<td>Tuberculosis coinfection prevention and treatment</td>
<td>8.264463</td>
<td>34.482759</td>
</tr>
<tr>
<th>11</th>
<td>Lower acute respiratory infections (nonsevere)</td>
<td>Case management at community or facility level</td>
<td>7.751938</td>
<td>6.329114</td>
</tr>
<tr>
<th>45</th>
<td>Problems requiring surgery</td>
<td>Surgical ward or services in district hospital...</td>
<td>7.352941</td>
<td>6.134969</td>
</tr>
<tr>
<th>15</th>
<td>Diarrheal disease</td>
<td>Construction and promotion of basic sanitation...</td>
<td>7.092199</td>
<td>3.861004</td>
</tr>
<tr>
<th>0</th>
<td>Congestive heart failure</td>
<td>ACE inhibitor and beta-blocker, with diuretics</td>
<td>6.666667</td>
<td>4.048583</td>
</tr>
<tr>
<th>48</th>
<td>HIV/AIDS</td>
<td>Treatment of opportunistic infections</td>
<td>6.410256</td>
<td>3.257329</td>
</tr>
<tr>
<th>51</th>
<td>Lymphatic filariasis</td>
<td>Vector control</td>
<td>6.250000</td>
<td>4.273504</td>
</tr>
<tr>
<th>38</th>
<td>HIV/AIDS</td>
<td>Mother-to-child transmission prevention</td>
<td>5.208333</td>
<td>2.702703</td>
</tr>
<tr>
<th>32</th>
<td>Tuberculosis (epidemic, latent)</td>
<td>Isoniazid treatment</td>
<td>5.076142</td>
<td>3.300330</td>
</tr>
<tr>
<th>37</th>
<td>Tuberculosis (epidemic)</td>
<td>Management of drug resistance</td>
<td>4.830918</td>
<td>90.909091</td>
</tr>
<tr>
<th>17</th>
<td>Tuberculosis (endemic, infectious or noninfect...</td>
<td>Directly observed short-course chemotherapy</td>
<td>3.322259</td>
<td>2.141328</td>
</tr>
<tr>
<th>36</th>
<td>Tuberculosis (endemic)</td>
<td>Management of drug resistance</td>
<td>3.144654</td>
<td>4.524887</td>
</tr>
<tr>
<th>24</th>
<td>Neonatal mortality</td>
<td>Family, community, or clinical neonatal package</td>
<td>2.898551</td>
<td>76.923077</td>
</tr>
<tr>
<th>1</th>
<td>Alcohol abuse</td>
<td>Advertising ban and reduced access to beverage...</td>
<td>2.475248</td>
<td>13.513514</td>
</tr>
<tr>
<th>23</th>
<td>Alcohol abuse</td>
<td>Excise tax, advertising ban, with brief advice</td>
<td>1.584786</td>
<td>16.666667</td>
</tr>
<tr>
<th>7</th>
<td>Ischemic heart disease</td>
<td>Aspirin, betablocker, and optional ACE inhibitor</td>
<td>1.453488</td>
<td>2.105263</td>
</tr>
<tr>
<th>20</th>
<td>Panic disorder</td>
<td>Drugs with optional psychosocial treatment</td>
<td>1.362398</td>
<td>1.428571</td>
</tr>
<tr>
<th>33</th>
<td>Coronary artery disease</td>
<td>Legislation substituting 2% of trans fat with ...</td>
<td>1.193317</td>
<td>0.781861</td>
</tr>
<tr>
<th>5</th>
<td>HIV/AIDS</td>
<td>Antiretroviral therapy</td>
<td>1.084599</td>
<td>0.874126</td>
</tr>
<tr>
<th>8</th>
<td>Parkinson's disease</td>
<td>Ayurvedic treatment and levodopa or carbidopa</td>
<td>0.883392</td>
<td>1.315789</td>
</tr>
<tr>
<th>22</th>
<td>Alcohol abuse</td>
<td>Excise tax</td>
<td>0.726216</td>
<td>3.921569</td>
</tr>
<tr>
<th>19</th>
<td>Depression</td>
<td>Drugs with optional episodic or maintenance ps...</td>
<td>0.588582</td>
<td>0.479846</td>
</tr>
<tr>
<th>25</th>
<td>Stroke (ischemic)</td>
<td>Heparin and recombinant tissue plasminogen act...</td>
<td>0.505817</td>
<td>0.715820</td>
</tr>
<tr>
<th>44</th>
<td>Ischemic heart disease</td>
<td>Statin, with aspirin and betablocker with ACE ...</td>
<td>0.493097</td>
<td>3.039514</td>
</tr>
<tr>
<th>40</th>
<td>Stroke and ischemic and hypertensive heart dis...</td>
<td>Polypill by absolute risk approach</td>
<td>0.469925</td>
<td>0.369004</td>
</tr>
<tr>
<th>21</th>
<td>Traffic accidents</td>
<td>Enforcement of seatbelt laws, promotion of chi...</td>
<td>0.408330</td>
<td>0.344828</td>
</tr>
<tr>
<th>50</th>
<td>Dengue</td>
<td>Vector control</td>
<td>0.389712</td>
<td>0.871840</td>
</tr>
<tr>
<th>13</th>
<td>Diarrheal disease</td>
<td>Cholera or rotavirus immunization</td>
<td>0.368732</td>
<td>2.183406</td>
</tr>
<tr>
<th>42</th>
<td>Epilepsy (refractory)</td>
<td>Second-line treatment with phenobarbital and l...</td>
<td>0.330360</td>
<td>15.151515</td>
</tr>
<tr>
<th>34</th>
<td>Bipolar disorder</td>
<td>Lithium, valproate, with optional psy-chosocia...</td>
<td>0.321234</td>
<td>0.813008</td>
</tr>
<tr>
<th>26</th>
<td>Diarrheal disease</td>
<td>Improved water and sanitation at current cover...</td>
<td>0.238949</td>
<td>0.226449</td>
</tr>
<tr>
<th>35</th>
<td>Bipolar disorder</td>
<td>Lithium, valproate, with optional psychosocial...</td>
<td>0.226398</td>
<td>0.604595</td>
</tr>
<tr>
<th>12</th>
<td>Lower acute respiratory infections (severe and...</td>
<td>Case management at hospital level</td>
<td>0.220751</td>
<td>0.309789</td>
</tr>
<tr>
<th>46</th>
<td>Trachoma</td>
<td>Tetracycline or azithromycin</td>
<td>0.159515</td>
<td>0.198689</td>
</tr>
<tr>
<th>3</th>
<td>Schizophrenia</td>
<td>Antipsychotic drugs with optional psychosocial...</td>
<td>0.101688</td>
<td>0.067912</td>
</tr>
<tr>
<th>4</th>
<td>Schizophrenia</td>
<td>Antipsychotic drugs with optional psychosocial...</td>
<td>0.083893</td>
<td>0.063975</td>
</tr>
<tr>
<th>31</th>
<td>Tuberculosis (endemic, latent)</td>
<td>Isoniazid treatment</td>
<td>0.075999</td>
<td>0.134825</td>
</tr>
<tr>
<th>47</th>
<td>HIV/AIDS</td>
<td>Treatment of Kaposi's sarcoma</td>
<td>0.019066</td>
<td>0.028602</td>
</tr>
</tbody>
</table>
</div>
<p>These estimates follow a similar pattern to the WSIPP data, with the top interventions a few orders of magnitude better than the worst.</p>
<figure style="text-align:center">
<a href="/images/returns/output_11_0.png"><img src="/images/returns/output_11_0.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/returns/output_12_0.png"><img src="/images/returns/output_12_0.png" /></a>
</figure>
<p>So it seems there might be returns to risk taking when using the spread as the (somewhat imperfect) measure of risk.</p>
<h3 id="national-health-service">National Health Service</h3>
<p>The second dataset I found is from <a href="https://academic.oup.com/jpubhealth/article/34/1/37/1554654">a meta-analysis</a> looking at the cost effectiveness of public health interventions within the English National Health Service (NHS) [15]. This dataset is similar to the DCP2 data above because the only measure of uncertainty is the spread of the estimates. Overall, there seems to be a similar but weaker pattern here:</p>
<table>
<thead>
<tr>
<th></th>
<th>guidance_topic</th>
<th>comparator</th>
<th>median</th>
<th>num_estimates</th>
<th>spread</th>
</tr>
</thead>
<tbody>
<tr>
<th>53</th>
<td>Smoking cessation—general population: client c...</td>
<td>Background quit rate; no intervention or usua...</td>
<td>20.000000</td>
<td>8.0</td>
<td>2.288330</td>
</tr>
<tr>
<th>4</th>
<td>Exercise prescriptions</td>
<td>Advice</td>
<td>12.987013</td>
<td>4.0</td>
<td>7.194245</td>
</tr>
<tr>
<th>56</th>
<td>Smoking cessation—general population: recruitm...</td>
<td>Background quit rate; no intervention or advice</td>
<td>3.846154</td>
<td>15.0</td>
<td>0.074499</td>
</tr>
<tr>
<th>62</th>
<td>Smoking cessation —general population: dentist...</td>
<td>Usual care</td>
<td>3.311258</td>
<td>3.0</td>
<td>10.989011</td>
</tr>
<tr>
<th>51</th>
<td>Smoking cessation—general population: incentiv...</td>
<td>Intervention no NRT</td>
<td>2.793296</td>
<td>2.0</td>
<td>1.597444</td>
</tr>
<tr>
<th>2</th>
<td>BA (5 min plus self-help)</td>
<td>Background quit rate</td>
<td>2.702703</td>
<td>8.0</td>
<td>1.801802</td>
</tr>
<tr>
<th>54</th>
<td>Smoking cessation—general population: proactiv...</td>
<td>Usual care or intervention but no telephone c...</td>
<td>2.341920</td>
<td>9.0</td>
<td>0.683527</td>
</tr>
<tr>
<th>58</th>
<td>Smoking cessation—general population: identify...</td>
<td>No intervention</td>
<td>1.984127</td>
<td>4.0</td>
<td>0.243902</td>
</tr>
<tr>
<th>61</th>
<td>Smoking cessation—general population: pharmaci...</td>
<td>Usual care</td>
<td>1.831502</td>
<td>2.0</td>
<td>4.608295</td>
</tr>
<tr>
<th>0</th>
<td>BA only (5 min)</td>
<td>Background quit rate</td>
<td>1.366120</td>
<td>8.0</td>
<td>0.909091</td>
</tr>
<tr>
<th>46</th>
<td>PA counselling</td>
<td>No intervention</td>
<td>1.157407</td>
<td>2.0</td>
<td>1.353180</td>
</tr>
<tr>
<th>64</th>
<td>Smoking cessation—disadvantaged groups: client...</td>
<td>No intervention</td>
<td>0.639386</td>
<td>3.0</td>
<td>0.166889</td>
</tr>
<tr>
<th>1</th>
<td>BA [5 min plus nicotine replacement therapy (N...</td>
<td>Background quit rate</td>
<td>0.473934</td>
<td>8.0</td>
<td>0.315557</td>
</tr>
<tr>
<th>70</th>
<td>Smoking cessation—disadvantaged groups: NHS SSS</td>
<td>No intervention</td>
<td>0.372301</td>
<td>2.0</td>
<td>3.311258</td>
</tr>
<tr>
<th>71</th>
<td>Smoking cessation—disadvantaged groups: pharma...</td>
<td>No intervention</td>
<td>0.317360</td>
<td>2.0</td>
<td>0.235738</td>
</tr>
<tr>
<th>90</th>
<td>Screening and BA during GP consultation</td>
<td>No intervention</td>
<td>0.303030</td>
<td>3.0</td>
<td>0.151515</td>
</tr>
<tr>
<th>19</th>
<td>Life-skills training</td>
<td>Normal education</td>
<td>0.286369</td>
<td>3.0</td>
<td>0.180180</td>
</tr>
<tr>
<th>74</th>
<td>Statins—disadvantaged groups: invitation for s...</td>
<td>Usual care or no intervention</td>
<td>0.230097</td>
<td>2.0</td>
<td>1.445087</td>
</tr>
<tr>
<th>72</th>
<td>Statins—general population: pharmacist based</td>
<td>Usual care or no intervention</td>
<td>0.204415</td>
<td>4.0</td>
<td>0.151837</td>
</tr>
<tr>
<th>86</th>
<td>Individual stress management</td>
<td>No intervention</td>
<td>0.200080</td>
<td>3.0</td>
<td>0.086498</td>
</tr>
<tr>
<th>87</th>
<td>Curricular</td>
<td>No intervention or standard education</td>
<td>0.138889</td>
<td>4.0</td>
<td>0.093721</td>
</tr>
<tr>
<th>30</th>
<td>Urban trail</td>
<td>No intervention</td>
<td>0.095740</td>
<td>4.0</td>
<td>0.044425</td>
</tr>
<tr>
<th>13</th>
<td>Brief counselling</td>
<td>Didactic messages</td>
<td>0.082008</td>
<td>2.0</td>
<td>4.385965</td>
</tr>
<tr>
<th>11</th>
<td>Accelerated partner therapy—doxycycline</td>
<td>Patient referral</td>
<td>0.071301</td>
<td>2.0</td>
<td>0.106952</td>
</tr>
<tr>
<th>15</th>
<td>Information motivation and behaviour skills</td>
<td>Didactic information</td>
<td>0.070706</td>
<td>2.0</td>
<td>0.129634</td>
</tr>
<tr>
<th>12</th>
<td>Accelerated partner therapy—azithromycin</td>
<td>Patient referral</td>
<td>0.051480</td>
<td>2.0</td>
<td>0.077220</td>
</tr>
<tr>
<th>76</th>
<td>Advice about PA</td>
<td>Usual care</td>
<td>0.027855</td>
<td>2.0</td>
<td>0.051546</td>
</tr>
<tr>
<th>16</th>
<td>Enhanced counselling</td>
<td>Didactic messages</td>
<td>0.021927</td>
<td>2.0</td>
<td>0.083243</td>
</tr>
</tbody>
</table>
<figure style="text-align:center">
<a href="/images/returns/output_35_0.png"><img src="/images/returns/output_35_0.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/returns/output_36_0.png"><img src="/images/returns/output_36_0.png" /></a>
</figure>
<h3 id="global-health-cost-effectiveness-analysis-registry">Global Health Cost Effectiveness Analysis Registry</h3>
<p>Finally, I look at a massive dataset from the <a href="http://healtheconomics.tuftsmedicalcenter.org/ghcearegistry/">Global Health Cost Effectiveness Analysis Registry</a> (GHCEA). First, I look at the full distribution of cost effectiveness estimates. It’s pretty clear they’re lognormally distributed:</p>
<figure style="text-align:center">
<a href="/images/returns/fullhist.png"><img src="/images/returns/fullhist.png" /></a>
</figure>
<p>Next, I look at just the estimates that have confidence intervals, which ends up being about 900 out of the 5000 original estimates. The confidence interval column was too erratic, so I went through by hand and found the upper and lower bound. I also filtered out studies with an overall quality rating below 4.5 (on a 1-7 scale). The end result is 653 estimates that have confidence intervals. Here’s a histogram of the filtered results – note that there’s a hole in the distribution where some of the estimates were filtered out:</p>
<figure style="text-align:center">
<a href="/images/returns/filterhist.png"><img src="/images/returns/filterhist.png" /></a>
</figure>
<p>Finally, here’s a scatterplot. It’s not the prettiest relationship in the world, but it does seem to have an upward slope:</p>
<figure style="text-align:center">
<a href="/images/returns/filterscatter.png"><img src="/images/returns/filterscatter.png" /></a>
</figure>
<h2 id="evidence-from-philanthropy">Evidence from Philanthropy</h2>
<p>GiveWell is an organization that does in-depth charity evaluations, often using cost effectiveness estimates in their decision process. They’ve recently changed their approach to <a href="https://www.givewell.org/how-we-work/our-criteria/cost-effectiveness/cost-effectiveness-models">explicitly accommodate</a> different philosophical positions, but the <a href="https://docs.google.com/spreadsheets/d/1KiWfiAGX_QZhRbC9xkzf3I8IqsXC5kkr-nwY_feVlcM/edit#gid=2064365103">older models</a> had their staff estimate different parameters for direct input.</p>
<p>Dan Wahl had the <a href="https://danwahl.github.io/stochastic-altruism">good idea</a> to run a monte carlo simulation by sampling from these staff parameters, which results in a set of estimates you can use to calculate the standard deviation and downside risk for an intervention. I downloaded <a href="https://github.com/danwahl/stochastic-altruism">his code</a> and put the combined outputs into <code class="language-plaintext highlighter-rouge">gw_data.csv</code> (see scrape.py), which I include below.</p>
<div>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>std</th>
<th>downside_risk</th>
</tr>
</thead>
<tbody>
<tr>
<th>iodine</th>
<td>41.418910</td>
<td>41.364215</td>
<td>1.438643</td>
</tr>
<tr>
<th>dtw</th>
<td>12.039437</td>
<td>11.307934</td>
<td>3.733827</td>
</tr>
<tr>
<th>sci</th>
<td>9.064981</td>
<td>8.657475</td>
<td>4.589170</td>
</tr>
<tr>
<th>ss</th>
<td>4.894470</td>
<td>4.598517</td>
<td>6.279625</td>
</tr>
<tr>
<th>lead</th>
<td>4.519663</td>
<td>14.533890</td>
<td>9.365194</td>
</tr>
<tr>
<th>bednets</th>
<td>3.572679</td>
<td>2.964297</td>
<td>6.958995</td>
</tr>
<tr>
<th>smc</th>
<td>3.222606</td>
<td>2.162316</td>
<td>7.082046</td>
</tr>
<tr>
<th>cash</th>
<td>1.060328</td>
<td>0.380961</td>
<td>8.921943</td>
</tr>
</tbody>
</table>
</div>
<p>The cost effectiveness rankings here follow a similar pattern to the other datasets, although it’s a little less pronounced:</p>
<figure style="text-align:center">
<a href="/images/returns/output_17_0.png"><img src="/images/returns/output_17_0.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/returns/output_18_0.png"><img src="/images/returns/output_18_0.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/returns/output_19_0.png"><img src="/images/returns/output_19_0.png" /></a>
</figure>
<p>So if you prefer to use the standard deviation as a measure, there do seem to be returns to risk taking – higher impact estimates tend to be noisier. But if the downside risk metric makes more sense to you, the highest impact estimates underperform the mean less often, so there’s not a return to this measure of risk.</p>
<h2 id="evidence-from-scientific-research">Evidence from Scientific Research</h2>
<p>I have two sources of data on the impact of scientific research. The first is from the Future of Humanity Institute’s (FHI) <a href="http://www.fhi.ox.ac.uk/research-into-neglected-diseases/">research</a> looking at the long term impact of neglected tropical disease research. The second is data I collected from Google Scholar on the variation in citation counts vs. mean citation counts for individual researchers.</p>
<p>I also found a few related papers in the existing “Science of Science” literature, and summarize those at the end.</p>
<h3 id="fhi-estimates">FHI Estimates</h3>
<p>These numbers differ from the GiveWell numbers above because they are estimates of the value of scientific research, and aren’t derived from randomized control trials of existing treatments. This means we should be much more <a href="https://en.wikipedia.org/wiki/Uncertainty_quantification#Sources_of_uncertainty">uncertain about this model</a> and the inputs.</p>
<div>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>group</th>
<th>disease</th>
<th>mu</th>
<th>sigma</th>
<th>median</th>
<th>mean</th>
<th>stdev</th>
<th>downside_risk</th>
</tr>
</thead>
<tbody>
<tr>
<th>3</th>
<td>Diarrhoeal disease</td>
<td>Diarrhoeal diseases</td>
<td>-1.466692</td>
<td>4.391203</td>
<td>0.230687</td>
<td>3549.783221</td>
<td>89983.188002</td>
<td>633.718520</td>
</tr>
<tr>
<th>14</th>
<td>Meningitis</td>
<td>Meningititis</td>
<td>-2.503745</td>
<td>4.463425</td>
<td>0.081778</td>
<td>1732.527683</td>
<td>42673.055643</td>
<td>642.757711</td>
</tr>
<tr>
<th>11</th>
<td>Parasitic and vector diseases</td>
<td>Leishmaniasis</td>
<td>-3.706662</td>
<td>4.721702</td>
<td>0.024559</td>
<td>1703.723172</td>
<td>19374.995032</td>
<td>645.685782</td>
</tr>
<tr>
<th>17</th>
<td>Leprosy</td>
<td>Leprosy</td>
<td>-5.014960</td>
<td>4.843504</td>
<td>0.006638</td>
<td>824.521890</td>
<td>10050.747664</td>
<td>651.870754</td>
</tr>
<tr>
<th>13</th>
<td>Parasitic and vector diseases</td>
<td>Trypanosomiasis</td>
<td>-5.296044</td>
<td>4.895739</td>
<td>0.005011</td>
<td>802.785665</td>
<td>9413.161293</td>
<td>652.616249</td>
</tr>
<tr>
<th>1</th>
<td>Malaria</td>
<td>Malaria</td>
<td>-3.161076</td>
<td>4.437962</td>
<td>0.042380</td>
<td>801.655755</td>
<td>2457.881370</td>
<td>646.632072</td>
</tr>
<tr>
<th>16</th>
<td>Meningitis</td>
<td>Multiple salmonella infections</td>
<td>-1.895189</td>
<td>4.127971</td>
<td>0.150290</td>
<td>753.616047</td>
<td>8048.560866</td>
<td>640.784014</td>
</tr>
<tr>
<th>15</th>
<td>Meningitis</td>
<td>Typhoid and paratyphoid fever</td>
<td>-2.798470</td>
<td>4.327229</td>
<td>0.060903</td>
<td>709.092672</td>
<td>76791.444201</td>
<td>645.379213</td>
</tr>
<tr>
<th>12</th>
<td>Parasitic and vector diseases</td>
<td>Chagas disease</td>
<td>-4.955053</td>
<td>4.740730</td>
<td>0.007048</td>
<td>534.967344</td>
<td>14100.186942</td>
<td>652.133687</td>
</tr>
<tr>
<th>0</th>
<td>HIV</td>
<td>HIV</td>
<td>-3.783888</td>
<td>4.358867</td>
<td>0.022734</td>
<td>303.678832</td>
<td>2298.067782</td>
<td>650.700399</td>
</tr>
<tr>
<th>6</th>
<td>Helminths</td>
<td>Trichuriasis</td>
<td>-1.937768</td>
<td>3.863822</td>
<td>0.144025</td>
<td>251.336051</td>
<td>1374.641439</td>
<td>644.123568</td>
</tr>
<tr>
<th>5</th>
<td>Helminths</td>
<td>Ascariasis</td>
<td>-1.894481</td>
<td>3.776170</td>
<td>0.150396</td>
<td>187.775769</td>
<td>5743.164462</td>
<td>645.026396</td>
</tr>
<tr>
<th>4</th>
<td>Helminths</td>
<td>Hookworm</td>
<td>-2.221467</td>
<td>3.745220</td>
<td>0.108450</td>
<td>120.526560</td>
<td>2534.843173</td>
<td>648.701035</td>
</tr>
<tr>
<th>2</th>
<td>TB</td>
<td>TB</td>
<td>-3.587428</td>
<td>4.086304</td>
<td>0.027669</td>
<td>116.922570</td>
<td>1920.355439</td>
<td>651.878971</td>
</tr>
<tr>
<th>7</th>
<td>Parasitic and vector diseases</td>
<td>Lymphatic filariasis</td>
<td>-2.843354</td>
<td>3.724580</td>
<td>0.058230</td>
<td>59.913072</td>
<td>1582.603743</td>
<td>651.768843</td>
</tr>
<tr>
<th>8</th>
<td>Parasitic and vector diseases</td>
<td>Schistosomiasis</td>
<td>-3.354314</td>
<td>3.742827</td>
<td>0.034933</td>
<td>38.477140</td>
<td>414.677808</td>
<td>653.955657</td>
</tr>
<tr>
<th>9</th>
<td>Parasitic and vector diseases</td>
<td>Onchocerciasis</td>
<td>-4.002002</td>
<td>3.718195</td>
<td>0.018279</td>
<td>18.365718</td>
<td>429.236723</td>
<td>655.572266</td>
</tr>
<tr>
<th>18</th>
<td>Trachoma</td>
<td>Trachoma</td>
<td>-3.984676</td>
<td>3.693719</td>
<td>0.018598</td>
<td>17.066256</td>
<td>340.242497</td>
<td>655.681808</td>
</tr>
<tr>
<th>10</th>
<td>Parasitic and vector diseases</td>
<td>Dengue</td>
<td>-5.767322</td>
<td>3.092770</td>
<td>0.003128</td>
<td>0.373548</td>
<td>14.685043</td>
<td>659.062757</td>
</tr>
</tbody>
</table>
</div>
<p>Again, these numbers follow the patterns of earlier estimates with some research topics substantially outperforming others:</p>
<figure style="text-align:center">
<a href="/images/returns/output_25_0.png"><img src="/images/returns/output_25_0.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/returns/output_26_0.png"><img src="/images/returns/output_26_0.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/returns/output_27_0.png"><img src="/images/returns/output_27_0.png" /></a>
</figure>
<p>So while there is a strong positive relationship between uncertainty and impact, there is a weaker negative relationship between downside risk and impact.</p>
<h3 id="research-citation-counts">Research Citation Counts</h3>
<p>Next, I thought it would be interesting to see if these patterns appear in researcher citation counts. I found a list of ecology researchers along with links to their Google Scholar profiles on <a href="https://github.com/weecology/bibliometrics/blob/master/Google_ecology.csv">GitHub</a>. I treated this list as a population of researchers (I’m not sure if it really is), then randomly selected 100 non-students and downloaded their list of publications and citation counts. I then calculated the mean citation count, standard deviation, and downside risk for each researcher.</p>
<p>The assumption here is that citation count is proportional to real world impact. Another thing to mention is that these scientists have different funding levels, so we don’t know the true funding to citation conversion rate.</p>
<div>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>std</th>
<th>downside_risk</th>
</tr>
<tr>
<th>id</th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>30</th>
<td>78.092593</td>
<td>107.506958</td>
<td>19.496066</td>
</tr>
<tr>
<th>89</th>
<td>19.361702</td>
<td>17.013327</td>
<td>23.927285</td>
</tr>
<tr>
<th>129</th>
<td>17.962963</td>
<td>20.803996</td>
<td>25.888243</td>
</tr>
<tr>
<th>143</th>
<td>12.964286</td>
<td>19.114629</td>
<td>29.574690</td>
</tr>
<tr>
<th>145</th>
<td>3.500000</td>
<td>4.485018</td>
<td>34.183520</td>
</tr>
</tbody>
</table>
</div>
<figure style="text-align:center">
<a href="/images/returns/output_32_0.png"><img src="/images/returns/output_32_0.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/returns/output_33_0.png"><img src="/images/returns/output_33_0.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/returns/output_34_0.png"><img src="/images/returns/output_34_0.png" /></a>
</figure>
<p>Using the standard deviation in a situation like this doesn’t make a lot of sense. By default, a very successful researcher might have a higher standard deviation in their citations as they progress through their career [10]. I think the downside risk metric is more useful here, and it shows that highly cited researchers outperform the mean researcher more often.</p>
<h3 id="uncertainty-in-peer-review">Uncertainty in Peer Review</h3>
<p>An alternate way to look at this question would be to try to relate peer reviewer uncertainty with eventual citation counts. At least theoretically, it could be rational to fund a study with a lower mean reviewer score if there is sufficient uncertainty [11]. While some research has found a positive relationship between mean reviewer score and eventual citation counts [12], and others have studied the variation in reviewer scores [13], nobody has related the variation in reviewer scores with eventual citation counts. I contacted the NIH and they don’t keep a record of individual reviewer scores for privacy reasons, so this type of study doesn’t seem possible currently.</p>
<h3 id="surprise-as-risk">Surprise as Risk</h3>
<p>Another fascinating study takes a different approach by measuring if the subject matter of the paper is risky/<a href="https://en.wikipedia.org/wiki/Self-information">surprising</a> [14]. They do this by comparing the chemicals discussed in the paper with an existing network of chemical knowledge. Studies that propose a new type of connection or a jump to new knowledge are judged to be more risky (Figure 1), and are eventually associated with higher citation counts and more scientific awards (Figure 3).</p>
<figure style="text-align:center">
<a href="/images/returns/network.png"><img src="/images/returns/network.png" /></a>
</figure>
<figure style="text-align:center">
<a href="/images/returns/scatter.png"><img src="/images/returns/scatter.png" /></a>
</figure>
<p>The effect size in this paper isn’t huge – a research strategy that is an order of magnitude less probable receives 2.26 more citations on average. But I think this paper gets closer to measuring the concept of scientific risk than anything else. They also conclude that a scientist trying to maximize citation count would probably focus on repeat projects, so specific policies to encourage higher risk science might be needed.</p>
<h2 id="conclusion">Conclusion</h2>
<p>It’s interesting to see some common patterns emerge across these different domains and datasets.</p>
<ul>
<li>First, the impact distributions make it clear that some interventions are much better than others. As a result, it makes sense to spend a lot of time searching for good opportunities.</li>
<li>Second, interventions with a high downside risk tend to have lower expected impacts. Even though high impact interventions are more uncertain, they dip below the mean less often or to a lesser extent. This is actually a good thing because it means there isn’t a huge downside to pursuing the high impact opportunities, at least in this dataset.</li>
<li>Third, there do seem to be returns to risk (uncertainty), so a large error bound on a cost effectiveness estimate shouldn’t be disqualifying on it’s own.</li>
</ul>
<p>Whether or not there are returns to risk, then, depends on your definition of risk. Using the definitions from the introduction, there do seem to be returns to risk (uncertainty). In other words, uncertainty is something you might have to learn to live with if you want to have a big effect on the world.</p>
<h2 id="references">References</h2>
<p>[1] <em>What Are Foundations For?</em> Boston Review. <a href="http://bostonreview.net/forum/foundations-philanthropy-democracy">http://bostonreview.net/forum/foundations-philanthropy-democracy</a></p>
<p>[2] <em>Hits-based Giving.</em> Open Philanthropy Project. <a href="https://www.openphilanthropy.org/blog/hits-based-giving">https://www.openphilanthropy.org/blog/hits-based-giving</a></p>
<p>[3] <em>Broad market efficiency.</em> GiveWell. <a href="https://blog.givewell.org/2013/05/02/broad-market-efficiency/">https://blog.givewell.org/2013/05/02/broad-market-efficiency/</a></p>
<p>[4] <em>The Confusion of Risk vs. Uncertainty.</em> The Guesstimate Blog. <a href="https://medium.com/guesstimate-blog/the-confusion-of-risk-vs-uncertainty-1c6cd512aa69">https://medium.com/guesstimate-blog/the-confusion-of-risk-vs-uncertainty-1c6cd512aa69</a></p>
<p>[5] <em>Benefit-Cost Results.</em> Washington State Institute for Public Policy. <a href="http://www.wsipp.wa.gov/BenefitCost">http://www.wsipp.wa.gov/BenefitCost</a></p>
<p>[6] <em>Disease Control Priorities in Developing Countries (DCP2).</em> <a href="http://www.dcp-3.org/dcp2">http://www.dcp-3.org/dcp2</a></p>
<p>[7] <em>GiveWell’s Cost-Effectiveness Analyses.</em> GiveWell. <a href="https://www.givewell.org/how-we-work/our-criteria/cost-effectiveness/cost-effectiveness-models">https://www.givewell.org/how-we-work/our-criteria/cost-effectiveness/cost-effectiveness-models</a></p>
<p>[8] <em>Stochastic Altruism.</em> <a href="https://danwahl.github.io/stochastic-altruism">https://danwahl.github.io/stochastic-altruism</a></p>
<p>[9] <em>Uncertainty Quantification.</em> Wikipedia. <a href="https://en.wikipedia.org/wiki/Uncertainty_quantification#Sources_of_uncertainty">https://en.wikipedia.org/wiki/Uncertainty_quantification#Sources_of_uncertainty</a></p>
<p>[10] <em>Quantifying the evolution of individual scientific impact.</em> <a href="http://science.sciencemag.org/content/354/6312/aaf5239">http://science.sciencemag.org/content/354/6312/aaf5239</a></p>
<p>[11] <em>Improving the Peer review process: Capturing more information and enabling high-risk/high-return research.</em> <a href="https://www.sciencedirect.com/science/article/pii/S0048733316301111">https://www.sciencedirect.com/science/article/pii/S0048733316301111</a></p>
<p>[12] <em>Big names or big ideas: Do peer-review panels select the best science proposals?</em> <a href="http://science.sciencemag.org/content/348/6233/434">http://science.sciencemag.org/content/348/6233/434</a></p>
<p>[13] <em>Peer Review Evaluation Process of Marie Curie Actions under EU’s Seventh Framework Programme for Research.</em> <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4488366/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4488366/</a></p>
<p>[14] <em>Tradition and Innovation in Scientists’ Research Strategies.</em> <a href="http://journals.sagepub.com/doi/abs/10.1177/0003122415601618">http://journals.sagepub.com/doi/abs/10.1177/0003122415601618</a></p>
<p>[15] <em>The cost-effectiveness of public health interventions.</em> Journal of Public Health. <a href="https://academic.oup.com/jpubhealth/article/34/1/37/1554654">https://academic.oup.com/jpubhealth/article/34/1/37/1554654</a></p>
<p>[16] <em>The Global Health Cost Effectiveness Analysis Registry</em> Tufts University. <a href="http://healtheconomics.tuftsmedicalcenter.org/ghcearegistry/">http://healtheconomics.tuftsmedicalcenter.org/ghcearegistry/</a></p>
<p><a href="https://pstblog.com/2017/12/02/risk-return">Are there returns to risk taking in science, philanthropy, or public policy?</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on December 02, 2017.</p>
https://pstblog.com/2017/10/07/jupyter-version-control2017-10-07T00:00:00+00:002017-10-07T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>I’m a big fan of using <a href="http://jupyter.org/">Jupyter Notebooks</a> for Python projects, but one downside is that version control is a pain. Commits become very large and illegible if you opt to track the entire notebook with the output cells, especially when graphics are included.</p>
<p>There are a number of <a href="https://stackoverflow.com/questions/18734739/using-ipython-notebooks-under-version-control">proposed solutions</a>, but they require either changing your git configuration or generating a Jupyter <a href="https://stackoverflow.com/a/25765194">configuration file and modifying it</a>. These approaches could vary by user and it’s not clear from the code or repo how the changes are being tracked. Below I outline an approach that might be better because the solution is included directly in a notebook cell.</p>
<h2 id="one-approach">One Approach</h2>
<p>One possibility is to use a tool called <a href="https://github.com/kynan/nbstripout">nbstripout</a> (installable via pip) along with a bash command in a notebook cell to solve this problem. Just add a cell like this one to your notebook, set the filenames, save the notebook, then run the cell to create a cleaned version of your notebook without the output.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># Save your notebook and run this cell to create
# a cleaned file for version control:
</span><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span>
<span class="n">notebook_path</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">getcwd</span><span class="p">(),</span><span class="s">'notebook_working.ipynb'</span><span class="p">)</span>
<span class="n">cleaned_path</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">getcwd</span><span class="p">(),</span><span class="s">'notebook_clean.ipynb'</span><span class="p">)</span>
<span class="c1">#Bash command below:
</span><span class="err">!</span><span class="n">cat</span> <span class="p">{</span><span class="n">notebook_path</span><span class="p">}</span> <span class="o">|</span> <span class="n">nbstripout</span> <span class="o">></span> <span class="p">{</span><span class="n">cleaned_path</span><span class="p">}</span>
<span class="n">date_str</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">().</span><span class="n">strftime</span><span class="p">(</span><span class="s">'%Y-%m-%d %H:%M:%S'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'{0} Cleaned file created at: {1}'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">date_str</span><span class="p">,</span> <span class="n">cleaned_path</span><span class="p">))</span></code></pre></figure>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2017-06-23 12:57:40 Cleaned file created at: <current directory>/notebook_clean.ipynb
</code></pre></div></div>
<p>Then use git to track the cleaned version and only commit to your working version when you need to push viewable results. This way people can easily see what changes have been made to the clean version and merge changes into the working version more confidently.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>git add notebook_clean.ipynb
<span class="nv">$ </span>git commit <span class="nt">-m</span> <span class="s2">"First change, no output tracked"</span>
<span class="nv">$ </span>git add notebook_clean.ipynb
<span class="nv">$ </span>git commit <span class="nt">-m</span> <span class="s2">"Second change, no output tracked"</span>
<span class="nv">$ </span>git add notebook_clean.ipynb notebook_working.ipynb
<span class="nv">$ </span>git commit <span class="nt">-m</span> <span class="s2">"Third change, with output tracked in working file"</span></code></pre></figure>
<p>You could even use .gitignore to completely untrack the working version if you don’t need to display the output anywhere:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># .gitignore
notebook_working.ipynb
</code></pre></div></div>
<h2 id="an-alternative">An Alternative</h2>
<p>Another option is to use Jupyter’s built-in <code class="language-plaintext highlighter-rouge">nbconvert</code> to output your notebook as a Python file, then track changes to that file to make it easier to read commits. This approach is nice because it doesn’t require an external dependency, but the downside is you still need to track the notebook and it’s incremental outputs.</p>
<p>Here’s an example of how this could work:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1">#Save, then run
</span><span class="kn">import</span> <span class="nn">os</span>
<span class="n">notebook_path</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">getcwd</span><span class="p">(),</span><span class="s">'notebook1.ipynb'</span><span class="p">)</span>
<span class="c1">#Bash:
</span><span class="err">!</span><span class="n">jupyter</span> <span class="n">nbconvert</span> <span class="o">--</span><span class="n">to</span> <span class="n">script</span> <span class="p">{</span><span class="n">notebook_path</span><span class="p">}</span></code></pre></figure>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[NbConvertApp] Converting notebook <current directory>/notebook1.ipynb to script
[NbConvertApp] Writing 588 bytes to <current directory>/nbconvert_test/notebook1.py
</code></pre></div></div>
<p>Then just track the resulting Python file each time you commit the notebook. If you have multiple files that you want to convert at once for version control, you could include a cell like this:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1">#Save, then run
</span><span class="kn">import</span> <span class="nn">os</span>
<span class="n">path1</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">getcwd</span><span class="p">(),</span><span class="s">'notebook1.ipynb'</span><span class="p">)</span>
<span class="n">path2</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">getcwd</span><span class="p">(),</span><span class="s">'notebook2.ipynb'</span><span class="p">)</span>
<span class="n">path_str</span> <span class="o">=</span> <span class="s">' '</span><span class="p">.</span><span class="n">join</span><span class="p">([</span><span class="n">path1</span><span class="p">,</span> <span class="n">path2</span><span class="p">])</span>
<span class="c1">#Bash:
</span><span class="err">!</span><span class="n">jupyter</span> <span class="n">nbconvert</span> <span class="o">--</span><span class="n">to</span> <span class="n">script</span> <span class="p">{</span><span class="n">path_str</span><span class="p">}</span></code></pre></figure>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[NbConvertApp] Converting notebook <current directory>/notebook1.ipynb to script
[NbConvertApp] Writing 588 bytes to <current directory>/nbconvert_test/notebook1.py
[NbConvertApp] Converting notebook <current directory>/notebook2.ipynb to script
[NbConvertApp] Writing 588 bytes to <current directory>/nbconvert_test/notebook2.py
</code></pre></div></div>
<p>Both these approaches makes it a little more clear how the files are being tracked and don’t resort to user specific configuration files that seem prone to error.</p>
<p><a href="https://pstblog.com/2017/10/07/jupyter-version-control">Version Control with Jupyter Notebooks</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on October 07, 2017.</p>
https://pstblog.com/2017/10/07/correlated-randoms2017-10-07T00:00:00+00:002017-10-07T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>I recently worked on a <a href="https://pstblog.com/2017/07/28/mental-model">visualization</a> where I needed to draw numbers from a <a href="https://en.wikipedia.org/wiki/Joint_probability_distribution">joint probability distribution</a>. This ended up being much harder than I thought it would be, so I thought I’d write up the results to save someone else some time. There are a number of resources out there for doing this in languages like Python and R, but nothing for JavaScript. I ended up using <a href="https://github.com/jstat/jstat">jStat</a> for many of it’s distributions and helper functions, along with this <a href="https://stackoverflow.com/questions/32718752/how-to-generate-correlated-uniform0-1-variables">Stackoverflow answer</a>.</p>
<figure style="text-align:center">
<a href="/images/Multivariate_normal_sample.png">
<img width="500px" src="/images/Multivariate_normal_sample.png" />
</a>
<figcaption>An example of a joint normal distribution, <a href="https://upload.wikimedia.org/wikipedia/commons/9/95/Multivariate_normal_sample.svg">source</a>.</figcaption>
</figure>
<p>Here are the steps:</p>
<ol>
<li>Create uncorrelated samples drawing from a standard normal distribution (mu=0, sigma=1).</li>
<li>Create a <a href="https://en.wikipedia.org/wiki/Correlation_and_dependence#Correlation_matrices">correlation matrix</a> with the desired correlation between the samples.</li>
<li>Matrix multiply the <a href="https://en.wikipedia.org/wiki/Cholesky_decomposition">cholesky decomposition</a> of the correlation matrix with the uncorrelated samples to create correlated normal samples.</li>
<li>Convert the correlated normal samples to correlated uniform samples using the standard normal cumulative distribution function (CDF). I think the result of this is considered a <a href="https://en.wikipedia.org/wiki/Copula_(probability_theory)">copula</a>.</li>
<li>Use the inverted CDF of the desired distributions to convert the correlated uniform samples into correlated samples. This is called <a href="https://en.wikipedia.org/wiki/Inverse_transform_sampling">inverse transform</a> sampling.</li>
</ol>
<p>Note that the sample correlation won’t be exactly what you specified in the correlation matrix, but it was close enough for my purposes. If you need a specific correlation, you could always write a while loop that repeats this process until it’s within a certain threshold of your desired correlation. Here’s the code for a joint lognormal distribution:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="c1">//script src="https://cdn.jsdelivr.net/npm/jstat@latest/dist/jstat.min.js" /script</span>
<span class="kd">function</span> <span class="nx">generateCopula</span><span class="p">(</span><span class="nx">rows</span><span class="p">,</span> <span class="nx">columns</span><span class="p">,</span> <span class="nx">correlation</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">//https://en.wikipedia.org/wiki/Copula_(probability_theory)</span>
<span class="c1">//Create uncorrelated standard normal samples</span>
<span class="kd">var</span> <span class="nx">normSamples</span> <span class="o">=</span> <span class="nx">jStat</span><span class="p">.</span><span class="nx">randn</span><span class="p">(</span><span class="nx">rows</span><span class="p">,</span> <span class="nx">columns</span><span class="p">);</span>
<span class="c1">//Create lower triangular cholesky decomposition of correlation matrix</span>
<span class="kd">var</span> <span class="nx">A</span> <span class="o">=</span> <span class="nx">jStat</span><span class="p">(</span><span class="nx">jStat</span><span class="p">.</span><span class="nx">cholesky</span><span class="p">(</span><span class="nx">correlation</span><span class="p">));</span>
<span class="c1">//Create correlated samples through matrix multiplication</span>
<span class="kd">var</span> <span class="nx">normCorrSamples</span> <span class="o">=</span> <span class="nx">A</span><span class="p">.</span><span class="nx">multiply</span><span class="p">(</span><span class="nx">normSamples</span><span class="p">);</span>
<span class="c1">//Convert to uniform correlated samples over 0,1 using normal CDF</span>
<span class="kd">var</span> <span class="nx">normDist</span> <span class="o">=</span> <span class="nx">jStat</span><span class="p">.</span><span class="nx">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">uniformCorrSamples</span> <span class="o">=</span> <span class="nx">normCorrSamples</span><span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span> <span class="p">{</span><span class="k">return</span> <span class="nx">normDist</span><span class="p">.</span><span class="nx">cdf</span><span class="p">(</span><span class="nx">x</span><span class="p">);});</span>
<span class="k">return</span> <span class="nx">uniformCorrSamples</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">generateCorrLognorm</span><span class="p">(</span><span class="nx">number</span><span class="p">,</span> <span class="nx">mu</span><span class="p">,</span> <span class="nx">sigma</span><span class="p">,</span> <span class="nx">correlation</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">//Create uniform correlated copula</span>
<span class="kd">var</span> <span class="nx">copula</span> <span class="o">=</span> <span class="nx">generateCopula</span><span class="p">(</span><span class="nx">mu</span><span class="p">.</span><span class="nx">length</span><span class="p">,</span> <span class="nx">number</span><span class="p">,</span> <span class="nx">correlation</span><span class="p">);</span>
<span class="c1">//Create unique lognormal distribution for each marginal</span>
<span class="kd">var</span> <span class="nx">lognormDists</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">mu</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">lognormDists</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">jStat</span><span class="p">.</span><span class="nx">lognormal</span><span class="p">(</span><span class="nx">mu</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="nx">sigma</span><span class="p">[</span><span class="nx">i</span><span class="p">]));</span>
<span class="p">}</span>
<span class="c1">//Generate correlated lognormal samples using the inverse transform method:</span>
<span class="c1">//https://en.wikipedia.org/wiki/Inverse_transform_sampling</span>
<span class="kd">var</span> <span class="nx">lognormCorrSamples</span> <span class="o">=</span> <span class="nx">copula</span><span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">x</span><span class="p">,</span> <span class="nx">row</span><span class="p">,</span> <span class="nx">col</span><span class="p">)</span> <span class="p">{</span><span class="k">return</span> <span class="nx">lognormDists</span><span class="p">[</span><span class="nx">row</span><span class="p">].</span><span class="nx">inv</span><span class="p">(</span><span class="nx">x</span><span class="p">);});</span>
<span class="k">return</span> <span class="nx">lognormCorrSamples</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">var</span> <span class="nx">mu</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span>
<span class="nx">sigma</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.25</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">],</span>
<span class="nx">correlation</span> <span class="o">=</span> <span class="p">[[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">],[</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">]];</span>
<span class="kd">var</span> <span class="nx">data</span> <span class="o">=</span> <span class="nx">generateCorrLognorm</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span> <span class="nx">mu</span><span class="p">,</span> <span class="nx">sigma</span><span class="p">,</span> <span class="nx">correlation</span><span class="p">);</span></code></pre></figure>
<p>A nice feature of this approach is that you can use any combination of distributions and create any number of correlated samples. All you need is to do is create the desired correlation matrix, define the distributions with jStat, and then use their inverted CDFs to convert the copula.</p>
<p>For example, if I wanted a joint normal-lognormal distribution, I could use the copula function above, then independently define distributions and use their inverted CDFs:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="kd">var</span> <span class="nx">rows</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="c1">//number of distributions</span>
<span class="nx">columns</span> <span class="o">=</span> <span class="mi">100</span><span class="p">;</span> <span class="c1">//number of samples</span>
<span class="nx">correlation</span> <span class="o">=</span> <span class="p">[[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">],[</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">]];</span>
<span class="kd">var</span> <span class="nx">copula</span> <span class="o">=</span> <span class="nx">generateCopula</span><span class="p">(</span><span class="nx">rows</span><span class="p">,</span> <span class="nx">columns</span><span class="p">,</span> <span class="nx">correlation</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">normal</span> <span class="o">=</span> <span class="nx">jStat</span><span class="p">.</span><span class="nx">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="nx">lognormal</span> <span class="o">=</span> <span class="nx">jStat</span><span class="p">.</span><span class="nx">lognormal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">),</span>
<span class="nx">dists</span> <span class="o">=</span> <span class="p">[</span><span class="nx">normal</span><span class="p">,</span> <span class="nx">lognormal</span><span class="p">];</span>
<span class="kd">var</span> <span class="nx">samples</span> <span class="o">=</span> <span class="nx">copula</span><span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">x</span><span class="p">,</span> <span class="nx">row</span><span class="p">,</span> <span class="nx">col</span><span class="p">)</span> <span class="p">{</span><span class="k">return</span> <span class="nx">dists</span><span class="p">[</span><span class="nx">row</span><span class="p">].</span><span class="nx">inv</span><span class="p">(</span><span class="nx">x</span><span class="p">);});</span></code></pre></figure>
<p>Note that the map function used above is <a href="https://jstat.github.io/all.html#map">unique to jStat</a> because it can operate across multiple arrays. The <code class="language-plaintext highlighter-rouge">samples</code> variable above is now an array with two correlated samples, one with a normal distribution and one with a lognormal one:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="o">></span> <span class="nx">samples</span>
<span class="p">[[</span><span class="o">-</span><span class="mf">0.3111414911440098</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.020000657895910958</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.753150640735201</span><span class="p">,</span> <span class="mf">0.45079960956056986</span><span class="p">,</span> <span class="p">...],</span>
<span class="p">[</span><span class="mf">0.7902422132853483</span><span class="p">,</span> <span class="mf">1.589785300759001</span><span class="p">,</span> <span class="mf">0.6259310681427537</span><span class="p">,</span> <span class="mf">0.8549897874735056</span><span class="p">,</span> <span class="p">...]]</span>
</code></pre></figure>
<p><a href="https://pstblog.com/2017/10/07/correlated-randoms">Generating Correlated Random Numbers in JavaScript</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on October 07, 2017.</p>
https://pstblog.com/2017/07/28/mental-model2017-07-28T00:00:00+00:002017-07-28T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>I think most people have some type of explicit or implicit <a href="https://en.wikipedia.org/wiki/Mental_model">mental model</a> of how the world works. I decided to try to create a visualization of mine, which I included below. Although it’s not meant to be comprehensive or mathematically perfect, I think it captures what I see as the important features and interactions between different sectors of society. The goal is to improve this over time so hopefully this visualization opens my ideas up for critique or <a href="https://github.com/psthomas/mental-model">contributions</a>.</p>
<p>There are three sectors in this model: markets, governments, and research. Each circle is a project that needs funding, which could represent an existing product or service in the market, a program like Medicaid within the government, or the ongoing work of a university lab in the research sector. The radius of each circle is proportional to the room for more funding – when the project has been fully funded the radius and marginal impact of additional funding drop to zero. The x-axis represents the variation in project outcome and the y-axis represents the <a href="https://en.wikipedia.org/wiki/Marginal_value">marginal</a> social impact of the project using units of something like wellbeing per dollar spent.</p>
<p>The user can set the percentage of the budget devoted to each sector and the allocation of each sector’s money between exploitation (funding existing projects), and exploration (searching for new projects). Any money allocated to the explore budget is spent on generating new circles, drawn from lognormal distributions. The respective exploit budgets are spent on the projects when the user clicks the “Next” button, resulting in wellbeing that is added into the Impact and Impact/Year categories. The end goal is to get a high Impact/Year score.</p>
<!--https://stackoverflow.com/questions/5867985-->
<div class="outer">
<div class="inner">
<iframe id="vis" style="width: 98vw; height: 100vh; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div>
<!--<script src="https://d3js.org/d3-request.v1.min.js"></script>-->
<script src="https://d3js.org/d3.v4.js"></script>
<script>
d3.request("https://raw.githubusercontent.com/psthomas/mental-model/master/model/model.html")
.get(function(a) {
document.getElementById("vis").srcdoc = a.response;
});
</script>
<p>Here are a few things I hope this visualization demonstrates:</p>
<h2 id="diminishing-returns">Diminishing Returns</h2>
<p>The marginal impact of each project declines linearly as it’s room for funding is used up. This is a pretty fundamental <a href="https://en.wikipedia.org/wiki/Diminishing_returns">concept in economics</a> and I think it applies to most situations in the real world. One aspect of the model that avoids diminishing returns is the exploration phase, as the probability of generating a bubble in each location remains constant over time for each sector. This fits well with my intuition about the world – there are diminishing returns to exploiting existing knowledge but not to generating new knowledge.</p>
<h2 id="the-importance-of-economic-growth">The Importance of Economic Growth</h2>
<p>I think the <a href="https://en.wikipedia.org/wiki/Gross_domestic_product#Income_approach">income</a> approach to measuring GDP is most helpful for thinking about economic growth in this model. A certain amount of income is generated in each time step, most of which goes to corporations and individuals. We as a society then decide what to do with this money in the next time step. We can leave it with individuals and corporations which ends up largely being invested in markets, or we can tax it and put it into government services or basic research.</p>
<p>All money left in markets grows at three percent each year to match historical real GDP growth. This is a simplification for a few reasons. First, government and research spending are also part of the GDP. Second, government spending can have a <a href="https://en.wikipedia.org/wiki/Fiscal_multiplier">multiplier effect</a> especially during recessions. Third, technological progress is thought to be one of the <a href="http://science.sciencemag.org/content/342/6160/817">main drivers</a> of long-run growth in at least the <a href="https://en.wikipedia.org/wiki/Solow%E2%80%93Swan_model">Solow model</a>. But I needed to demonstrate that there would be a penalty over time if the user taxed their whole economy to put the money into research, so this is my solution.</p>
<p>One of the best approaches to getting a high Impact/Year over the long term is to put 80-90% of the budget in the Market and let it grow. I suspect the percent of money allocated to markets and the long term growth rate of three percent would be the most important parameters if I ported this over to Python and ran some simulations. This model behavior fits well with the real world data. For example, GDP per capita <a href="https://ourworldindata.org/happiness-and-life-satisfaction/#correlates-determinants-and-consequences">correlates strongly</a> with life satisfaction over time and across countries.</p>
<h2 id="social-impact-is-the-goal">Social Impact is the Goal</h2>
<p>I use social impact as the measure of progress rather than GDP. I think the distinction between the two is important because if we used GDP, the immediate impacts of government or research would drop substantially (even if they have large indirect effects). It’s only through the lens of wellbeing that many of the actions of government or research make sense.</p>
<h2 id="the-roles-of-different-sectors">The Roles of Different Sectors</h2>
<p>The main point here is that markets, governments, and research all draw from different probability distributions. Markets largely work at lower levels of risk, and, although the projects here have lower marginal impacts, the number of opportunities here outnumber those in other sectors. I estimate lower marginal impacts here because the alternative for a consumer in a competitive marketplace is usually a similar product with a slightly higher price if a company doesn’t offer a product. The rarer case where a single company makes a decision that has a large social impact (because it’s truly innovative or has monopoly power) is located in the tail of the distribution.</p>
<p>Research, on the other hand, is able to accept higher variability for higher social returns. Researchers are often criticized by the public for working on esoteric projects without clear applications, but occasionally something like <a href="http://science.sciencemag.org/content/346/6213/1258096.full">CRISPR-Cas9</a> is discovered that has a revolutionary effect. Both the exploration and exploitation phases of research need to be funded for research to work well – if no exploitation is funded the highest impact findings never translate into markets. But if you don’t do enough exploration the pipeline of new ideas dries up, which prevents you from taking advantage of the full potential of research.</p>
<p>The bulk of Government projects exist in a middle ground between markets and research. This is because governments are often directly addressing problems in areas of market failures like the insurance market for low income individuals. The marginal impact is fairly large in a situation like this because the counterfactual for an individual with no insurance is probably a delay in treatment until an emergency room visit.</p>
<p>Occasionally governments need to make a decisions during crises where catastrophic outcomes are possible. To get a feel for this, consider simulating one of our many nuclear close calls [<a href="https://en.wikipedia.org/wiki/List_of_nuclear_close_calls#1950s">3</a>, <a href="http://www.ucsusa.org/sites/default/files/attach/2015/04/Close%20Calls%20with%20Nuclear%20Weapons.pdf">4</a>] over and over again with small changes in the initial conditions. Realistic outcomes for this experiment would probably range from completely avoiding conflict to the destruction of our civilization. Government impact in situations like these is mainly driven by <a href="https://en.wikipedia.org/wiki/Path_dependence">path dependence</a> – the actions can’t be undone or modified.</p>
<p>Research, on the other hand, largely functions by shifting the future into the present more rapidly, so the main benefit is applied during the time before the discovery would have occurred otherwise [<a href="http://www.fhi.ox.ac.uk/research-into-neglected-diseases/">5</a>]. (Note: this isn’t always the case because research can have a time dependent application. For example, advances in battery technology for storing renewable energy might prevent a path dependent change in our climate right now.)</p>
<h2 id="the-explore-exploit-tradeoff">The Explore-Exploit Tradeoff</h2>
<p>This tradeoff is common in optimization problems across many different domains [<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410143/">6</a>, <a href="http://lazerlab.net/publication/network-structure-exploration-and-exploitation">7</a>, <a href="https://en.wikipedia.org/wiki/Multi-armed_bandit">8</a>, <a href="https://www.cgdev.org/publication/searching-devil-details-learning-about-development-program-design-working-paper-434">9</a>] . Every society needs to find an equilibrium between present and future benefits, so I thought this would be an important concept to build into the model.</p>
<h2 id="possible-negative-effects">Possible Negative Effects</h2>
<p>Not every exploration results in a project with an positive expected marginal impact. For a random 10% of the projects I multiply the impact by -0.5, resulting in a bimodal distribution. Nevertheless, the project is funded at the same level as other projects because society isn’t always very good at discerning impact. In this model, larger potential negative effects come with higher levels of risk. For example, some areas of research or government actions have the potential to be catastrophic if we’re not careful. These <a href="https://en.wikipedia.org/wiki/Global_catastrophic_risk">catastrophic risks</a> would dwarf the rest of these projects in negative or positive impact, which is part of the reason I don’t include numerical axes as it’s difficult to legibly show the impacts on the same scale.</p>
<h2 id="the-role-of-philanthropy">The Role of Philanthropy</h2>
<p>Philanthropists could act in this model in a few ways. First, they could try to choose a existing project with a high marginal impact and fund it until it’s marginal impact is lower than a different project, then switch to a new one. Second, they could fund the exploration phase of research to try to create an opportunity that is better than the existing options. Finally, they could try to change the model parameters by influencing the political process or funding research about the optimal model settings.</p>
<p>Philanthropists tend to take many different approaches in society, but I think taking big risks to create new ideas or trying to influence policy for the better are among the best opportunities [<a href="http://bostonreview.net/forum/foundations-philanthropy-democracy">10</a>, <a href="http://www.openphilanthropy.org/blog/hits-based-giving">11</a>, <a href="https://ssir.org/articles/entry/the_elusive_craft_of_evaluating_advocacy">12</a>].</p>
<h2 id="model-problems">Model Problems</h2>
<p>There are a number of problems with this model. In addition to the ones I mentioned above, here are a few more:</p>
<ul>
<li><strong>Causation isn’t so clear cut</strong>. For example, good economic policy might lead to better functioning markets, which might free up more money for research, which might result in research that improves economic policy. This seems to operate more like a mutually beneficial relationship where you can’t neatly divide things up by causation.</li>
<li><strong>Growth isn’t fixed</strong>. I set the growth rate at three percent in this model, but in reality there would be a feedback between the settings and growth. I might create a future version where the user can change the rate and observe the effect, but adding a feedback would get very complicated.</li>
<li><strong>Can wellbeing be summed?</strong> There are a few <a href="https://en.wikipedia.org/wiki/Utilitarianism#Aggregating_utility">philosophical objections</a> to summing wellbeing. Also, some make the point that measuring total wellbeing doesn’t account for the distribution of that wellbeing, which is a valid point.</li>
<li><strong>Can money buy everything?</strong> In the case of a well functioning government, funding might not be the limiting factor on competence. No amount of project funding will suddenly improve decision making skills in a crisis. Government competence is something that needs to be built over <a href="https://www.cgdev.org/publication/capability-traps-mechanisms-persistent-implementation-failure-working-paper-234">decades and centuries</a>, and probably depends on something other than funding levels.</li>
<li><strong>Different distributions?</strong> It’s possible that the lognormal distribution isn’t the best fit for opportunities to do good in the world. Maybe a power-law distribution would fit better, or maybe I need to change the existing lognormal parameters. Right now they’re tuned for visual communication, not accuracy.</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>It took me a while to understand this, but the above concepts imply that both of these can simultaneously be true:</p>
<ol>
<li>The vast majority of good that is done in the world is achieved through markets.</li>
<li>The best opportunities to do good in the world lie outside of markets.</li>
</ol>
<p>This is because the opportunities with the highest marginal impacts lie in the tails of the government and research distributions. I hope you find this model interesting. It’s a first draft, so feel free to critique or <a href="https://github.com/psthomas/mental-model">contribute</a>. In the future I might create a version that allows people to choose the probability distributions parameters to fit with their intuitions about the world, so stay tuned.</p>
<h2 id="appendix-a-how-it-works">Appendix A: How It Works</h2>
<p>There can be large differences in the impacts of different actions so I generate the data using a <a href="https://en.wikipedia.org/wiki/Joint_probability_distribution">joint</a> <a href="https://en.wikipedia.org/wiki/Log-normal_distribution">lognormal</a> probability distribution with a correlation between the risk and impact. The specifics of how the joint distributions are generated are in a <a href="https://pstblog.com/2017/10/07/correlated-randoms">separate blogpost</a> if you’re interested. I leave numbers and units off the axes because I don’t think there’s a single measure of wellbeing and the relationship between the sectors matters more than the numerical values.</p>
<p>The user can set the percentage of the budget devoted to each sector and the allocation of each sector’s money between exploitation (funding existing projects), and exploration (searching for new projects). When the user clicks the “Next” button, the exploit budget is allocated to each circle by finding the maximum percentage that can be multiplied times the project funding needs in each sector while staying below the budget. This means that society funds every project by some amount because it doesn’t always know the real marginal impact of each project.</p>
<p>Here’s the section of the code that finds the percentage funding level:</p>
<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="kd">var</span> <span class="nx">sum</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">,</span>
<span class="nx">pct</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="nx">sum</span> <span class="o"><=</span> <span class="nx">exploit_budget</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">sum</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">;</span>
<span class="nx">pct</span> <span class="o">+=</span> <span class="mf">0.001</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">j</span> <span class="o"><</span> <span class="nx">sector</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">sum</span> <span class="o">+=</span> <span class="nx">pct</span> <span class="o">*</span> <span class="nx">sector</span><span class="p">[</span><span class="nx">j</span><span class="p">].</span><span class="nx">size</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>As the research projects are funded, the variation associated with each project declines and the bubbles diffuse through the market. Once a research project has a low enough variation, there’s a ten percent chance that it will transition to the market’s budget each year. This aspect of the model isn’t perfect, as the transition to markets should be governed by risk with respect to market returns, not risk with respect to social impact. But if the two are at least correlated I think it’s an okay assumption.</p>
<p>For exploratory funding, new circles are generated from <a href="https://en.wikipedia.org/wiki/Log-normal_distribution">lognormal distributions</a> with unique mus and sigmas for each of the sectors. This model is somewhat informed by the concept of an <a href="https://en.wikipedia.org/wiki/Modern_portfolio_theory#Efficient_frontier_with_no_risk-free_asset">efficient frontier</a> from modern portfolio theory. Generally, I think there’s a positive relationship between risk taking and societal impact, although greater potential for societal harm comes at high levels of risk as well.</p>
<p>Because markets are driven by a profit motive and have a shorter time horizon, their risk profile and corresponding social impact are lower (smaller mus and sigmas). Governments and basic research can play at higher levels of risk, so they’re rewarded accordingly with higher marginal social impacts (higher mus and sigmas). Research has more projects in areas of moderately high social impact than government (higher mu), but government has a heavier tail at the very high levels of impact (higher sigma). The heavier tail for Government is driven by situations like nuclear crises where decisions have the potential to dwarf all the other areas in impact.</p>
<figure>
<a href="/images/prob_dist.png"><img src="/images/prob_dist.png" /></a>
<figcaption>The probability distributions for the marginal impacts of different sectors.</figcaption>
</figure>
<p>Finally, three percent of the money allocated to markets is added to the budget each year. This is meant to simulate the importance of economic growth, and penalizes putting too much money into research or government. The growth rate and money allocated to markets are some of the most important factors in the long term performance of the model.</p>
<p>Generating samples from <a href="https://en.wikipedia.org/wiki/Joint_probability_distribution">joint</a> lognormal distributions ended up being much more difficult than I thought it would be. There are a number of resources out there and packages for languages like Python and R, but nothing for JavaScript. See <a href="https://pstblog.com/2017/10/07/correlated-randoms">this blogpost</a> for more on how I did this with JavaScript.</p>
<h2 id="references">References</h2>
<p>[1] What’s So Special About Science (And How Much Should We Spend on It?). Science. <a href="http://science.sciencemag.org/content/342/6160/817">http://science.sciencemag.org/content/342/6160/817</a></p>
<p>[2] Happiness and Life Satisfaction. Our World in Data. <a href="https://ourworldindata.org/happiness-and-life-satisfaction/#correlates-determinants-and-consequences">https://ourworldindata.org/happiness-and-life-satisfaction/#correlates-determinants-and-consequences</a></p>
<p>[3] List of nuclear close calls. Wikipedia. <a href="https://en.wikipedia.org/wiki/List_of_nuclear_close_calls#1950s">https://en.wikipedia.org/wiki/List_of_nuclear_close_calls#1950s</a></p>
<p>[4] Close Calls with Nuclear Weapons. Union of Concerned Scientists. <a href="http://www.ucsusa.org/sites/default/files/attach/2015/04/Close%20Calls%20with%20Nuclear%20Weapons.pdf">http://www.ucsusa.org/sites/default/files/attach/2015/04/Close%20Calls%20with%20Nuclear%20Weapons.pdf</a></p>
<p>[5] Estimating the cost-effectiveness of research into neglected diseases. Future of Humanity Institute. <a href="http://www.fhi.ox.ac.uk/research-into-neglected-diseases/">http://www.fhi.ox.ac.uk/research-into-neglected-diseases/</a></p>
<p>[6] Exploration versus exploitation in space, mind, and society. Trends in Cognitive Science. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410143/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410143/</a></p>
<p>[7] The Network Structure of Exploration and Exploitation. Administrative Science Quarterly. <a href="http://lazerlab.net/publication/network-structure-exploration-and-exploitation">http://lazerlab.net/publication/network-structure-exploration-and-exploitation</a></p>
<p>[8] Multi-armed bandit. Wikipedia. <a href="https://en.wikipedia.org/wiki/Multi-armed_bandit">https://en.wikipedia.org/wiki/Multi-armed_bandit</a></p>
<p>[9] Searching for the Devil in the Details: Learning about Development Program Design. Center for Global Development. <a href="https://www.cgdev.org/publication/searching-devil-details-learning-about-development-program-design-working-paper-434">https://www.cgdev.org/publication/searching-devil-details-learning-about-development-program-design-working-paper-434</a></p>
<p>[10] What Are Foundations For? Boston Review. <a href="http://bostonreview.net/forum/foundations-philanthropy-democracy">http://bostonreview.net/forum/foundations-philanthropy-democracy</a></p>
<p>[11] Hits-based Giving. Open Philanthropy Project. <a href="http://www.openphilanthropy.org/blog/hits-based-giving">http://www.openphilanthropy.org/blog/hits-based-giving</a></p>
<p>[12] The Elusive Craft of Evaluating Advocacy. Stanford Social Innovation Review. <a href="https://ssir.org/articles/entry/the_elusive_craft_of_evaluating_advocacy">https://ssir.org/articles/entry/the_elusive_craft_of_evaluating_advocacy</a></p>
<p>[13] Capability Traps? The Mechanisms of Persistent Implementation Failure. Center for Global Development. <a href="https://www.cgdev.org/publication/capability-traps-mechanisms-persistent-implementation-failure-working-paper-234">https://www.cgdev.org/publication/capability-traps-mechanisms-persistent-implementation-failure-working-paper-234</a></p>
<p><a href="https://pstblog.com/2017/07/28/mental-model">My Mental Model of the World</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on July 28, 2017.</p>
https://pstblog.com/2017/06/05/national-election-vis2017-06-05T00:00:00+00:002017-06-05T00:00:00+00:00Philip Thomashttps://pstblog.compthomas.v3@gmail.com
<p>In my continuing quest to understand the 2016 election, I decided to build another visualization. This version shows the turnout and two party margin by county for the 2004, 2008, 2012 and 2016 presidential elections. I made an <a href="/2016/12/09/wisconsin-election">earlier</a> version of this visualization using Wisconsin data, but I recently pieced together the national data as well. I added a few features in this version:</p>
<ul>
<li>It’s now possible to search by state or county.</li>
<li>The county bubble areas are proportional to the fraction of national votes, fraction of electoral votes, or the Voter Power Index (VPI).</li>
<li>Electoral votes and national vote percentages are tallied in the bottom left.</li>
<li>Each tooltip now shows both the county level and state level data.</li>
<li>Clicking and dragging the counties updates the vote percentages and electoral counts if the vote threshold for the state is crossed. I find this is a good way to consider “what if” scenarios for the elections.</li>
<li>A dropdown menu now allows switching between county, state, and different demographic data sources.</li>
<li>2020 results are now included for states and counties, but I’m still waiting on demographic data.</li>
</ul>
<p>All the code and data are available at a GitHub repo <a href="https://github.com/psthomas/election-vis">here</a>.</p>
<!--https://stackoverflow.com/questions/5867985-->
<div class="outer">
<div class="inner">
<!--src="/vis/national-election.html"-->
<iframe id="vis" style="width: 98vw; height: 100vh; border: none; position: relative; right:-50%; scrolling:no;"></iframe>
</div>
</div>
<script src="https://d3js.org/d3.v4.js"></script>
<script> d3.request("https://raw.githubusercontent.com/psthomas/election-vis/master/scatter.html").get(function(a) { document.getElementById("vis").srcdoc = a.response; }); </script>
<h2 id="a-few-notes">A Few Notes</h2>
<p>It’s pretty interesting to click through the years and see the turnout and margin changes for each county. Here are a few things that I noticed when building the visualization:</p>
<ul>
<li>The drop in turnout happened in 2012; 2016 was largely about a left-right sorting of counties by size (although crucial counties like Milwaukee still saw a drop in turnout).</li>
<li>The left-right sorting is especially apparent in Midwestern swing states that gave the election to Trump. Try searching for WI, PA, MI, IA, MN and clicking through the years to see this sorting in action.</li>
<li>The value of an additional voter is much higher in some states than others in most elections (except 2012). To see this, weight the circles by the Voter Power Index (VPI) and click through the years. New Hampshire, Pennsylvania, Wisconsin and Michigan dominate the voting power calculation for 2016. Clinton would have won the 2016 election if turnout was a few points higher in just three counties: Milwaukee County WI, Wayne County MI, and Philadelphia County PA.</li>
<li>A good approach to flipping an election is to weight the circles by VPI, then click and drag the largest circles to increase the turnout or margin.</li>
</ul>
<h2 id="assumptions">Assumptions</h2>
<ul>
<li>I assume increases in turnout are apportioned based on the fraction of each county that voted for each party initially. This probably underestimates the impact of increased turnout for Democrats because the electorate often leans left as turnout increases.</li>
<li>Changes in margin are zero sum. Any increase in the Democrat’s vote total comes from Republican voters switching sides, not from third party candidates.</li>
</ul>
<h2 id="calculations">Calculations</h2>
<p><strong>Voter Power Index</strong>: This index is an estimate of the value of additional voters in each state based on [given] the candidate margin of victory. Groups like <a href="link">538</a> predict how likely a state is to switch between the candidates in order to calculate the VPI, but this takes a simpler approach as outlined at <a href="http://www.dailykos.com/story/2016/12/19/1612252/-Voter-Power-Index-Just-How-Much-Does-the-Electoral-College-Distort-the-Value-of-Your-Vote">DailyKos</a>. This equation calculates the VPI and apportions it to each county based on the fraction of the state’s votes:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">VPI = (county_number/state_number) * (state_electoral_votes/(Math.abs(num_state_dem-num_state_rep)))</code></li>
</ul>
<p><strong>Electoral Weighting</strong>: This weighting splits the electoral college points among the counties based on their fraction of the state vote:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">electoral_weighting = (county_number/state_number)*state_electoral_votes</code></li>
</ul>
<p><strong>Vote Weighting</strong>: This approach sizes the circle area in proportion to the total votes in the county.</p>
<p><strong>Dragging Circles</strong>: When the user drags a circle, these equations are used to recalculate the county level data. These updates and others happen in the <code class="language-plaintext highlighter-rouge">dragged()</code> function in the code:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">new_county_number = new_turnout*county_voting_age_population</code></li>
<li><code class="language-plaintext highlighter-rouge">new_dem_number = (old_dem_fraction + dem_margin_change/2)*new_county_number</code></li>
<li><code class="language-plaintext highlighter-rouge">new_rep_number = (old_rep_fraction - dem_margin_change/2)*new_county_number</code></li>
</ul>
<h2 id="data-issues">Data Issues</h2>
<p>I’m fairly confident that the aggregate data are accurate because vote counts and electoral outcomes are similar to those of David Leip’s <a href="http://uselectionatlas.org/">Election Atlas</a>. But even if the aggregates are accurate, it’s still possible that there are problems at the individual county level.</p>
<p>The turnout exceeded 100% in a few counties, which I made note of and filtered out in the <a href="https://github.com/psthomas/election-vis/blob/master/voting_national.ipynb">Jupyter notebook</a>. This issue is either caused by bad county level vote tallies or bad voting age population data. I’m using a new approach to estimating the citizen voting age population data by running a regression on census bureau data, so this partially resolves the problem [3].</p>
<p>It’s also important to mention the <a href="http://www.electproject.org/home/voter-turnout/faq/denominator">distinction</a> between Citizen Voting Age Population (VAP) and Voting Eligible Population (VEP). VEP estimates remove felons (depending on state law), and other groups that are ineligible to vote. This means that using the VAP data could underestimate turnout in counties with e.g. high felony convictions. The Sentencing Project <a href="http://www.pewtrusts.org/en/research-and-analysis/blogs/stateline/2016/10/10/more-than-six-million-felons-cant-vote-in-2016">estimates</a> that 6 million felons were ineligible to vote in 2016, so the effect on estimated turnout could be substantial. Unfortunately, VEP data isn’t available at the county level so I used VAP data instead. This might be preferable in some ways though because it highlights a problem – close to 2.5 percent of the US Population isn’t being represented by their government.</p>
<p>Adding in the demographic data led to a new set of problems. I used a combination of the Census Bureau’s Current Population Survey [4] for the <code class="language-plaintext highlighter-rouge">Turnout</code> and <code class="language-plaintext highlighter-rouge">Fraction of the Electorate</code> values (courtesy of the Elections Project [5]), and the American National Election Studies for the <code class="language-plaintext highlighter-rouge">Democratic Margin</code> values [6]. Extrapolating from demographic survey data to national vote counts doesn’t lead to good estimates, so think of the difference between the estimated percentages and actual percentages from the county data as a measure of the error. This is a well known problem [7] and is a result of uncertainty in the surveys. I also had to interpolate some values to get the categories to line up across datasets, so I make note of that when it’s done in the Jupyter notebook.</p>
<p>My goal is to improve the accuracy and number of years covered over time, so suggestions and pull requests are welcome.</p>
<h2 id="sources">Sources</h2>
<p>[1] MIT Election Labs, 2000-2016 County level presidential results: <a href="https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ">https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ</a></p>
<p>[2] Census Bureau County Voting Age Population data: <a href="https://www.census.gov/programs-surveys/decennial-census/about/voting-rights/cvap.html">https://www.census.gov/rdo/data/voting_age_population_by_citizenship_and_race_cvap.html</a></p>
<p>[3] CVAP, my project estimating annual citizen voting age population by state and county. <a href="https://github.com/psthomas/cvap">https://github.com/psthomas/cvap</a></p>
<p>[4] Voting and Registration Tables. US Census Bureau. <a href="https://www.census.gov/topics/public-sector/voting/data/tables.All.html">https://www.census.gov/topics/public-sector/voting/data/tables.All.html</a></p>
<p>[5] United States Election Project, demographic turnout data. <a href="http://www.electproject.org/home/voter-turnout/demographics">http://www.electproject.org/home/voter-turnout/demographics</a></p>
<p>[6] American National Election Studies, demographic margins data. <a href="http://www.electionstudies.org/studypages/download/datacenter_all_NoData.html">http://www.electionstudies.org/studypages/download/datacenter_all_NoData.html</a></p>
<p>[7] Voter Trends in 2016. Center for American Progress. <a href="https://www.americanprogress.org/issues/democracy/reports/2017/11/01/441926/voter-trends-in-2016/">https://www.americanprogress.org/issues/democracy/reports/2017/11/01/441926/voter-trends-in-2016/</a></p>
<p><a href="https://pstblog.com/2017/06/05/national-election-vis">Visualizing Voter Turnout and Margins by County, 2004 to 2020</a> was originally published by Philip Thomas at <a href="https://pstblog.com">pstblog</a> on June 05, 2017.</p>