There are a number of policy areas where decision making is based on the collective judgement of experts. Given the importance, it makes sense to study the collective decision making process to ensure it leads to the best possible results. One interesting expert opinion aggregator in the area of economics is the Chicago Booth IGM Experts Panel, which I collect data from in this post. Although the panel might not be representative of the entire profession, it’s still a useful sanity check for outsiders without in-depth knowledge of economics.
The voting process for the panel works like this: the members agree on the wording of a statement, then provide a vote (from
strongly disagree to
strongly agree), and a confidence level for their vote (
1-10). The end result is aggregated and published on their website.
There have been a few papers based on this data [1, 2], but the most recent one was published in 2012. I thought it would be interesting to look at the more recent data which isn’t published in a usable format anywhere, so I used Python, Pandas and BeautifulSoup to scrape the data from their website. The final Python script is available here.
Finding a Pattern in the URLs
The URL for each vote is hard to predict, as each one has a survey id that’s a somewhat random string like this:
SV_429IHJQVpBV1cnb. Luckily, the URLs for the economist bio pages do follow an easy pattern, with the
id value ranging from
1 to 51. So I wrote a for loop to step through each of their bio pages, getting the voting data from each one:
So the above code just steps through each bio page, and calls the
get_data function on each URL.
Scraping the Data
Each bio page has the vote history of each user, so I just need to access the URL, then parse the HTML for the content I need. I use the
requests module to access each page, then
BeautifulSoup to parse the HTML. The structure of the HTML is a little tricky, as the subquestions aren’t nicely nested within the question title tag. To add to the complexity, there are between one and three subquestions for each title, so the HTML parser needs to adapt based on the circumstances. Here is an example of the HTML structure for an individual question:
To solve this problem, I use BeautifulSoup’s
.next_sibling feature along with a
while loop to step through the question text and voting data until it hits the next
<h2> question title header:
After taking care of a few edge cases where economists were added late and the table was structured differently, the script works! The final version of the script is posted here. The final table of all data is available here. Some of the data in the preceding table is repetitive, so I also split the files up into questions and responses tables, with the combination of columns
subquestion as the primary key between the two.
Here are some basic summary statistics of the data. First, a summary of the numerical information:
And then a summary of the textual information:
One thing that I’m interested in is whether confidence declines with the scale of a claim – are economists less certain when they
strongly agree or
strongly disagree? An initial look at this suggests the opposite (although what we really want to look at is something like “distance from the median vote”, not just the vote):
This also might highlight a problem with the survey – it’s a little weird to say that your are “confidently uncertain” about an issue, even if saying that you are “confident that the literature is uncertain about this issue” is a perfectly reasonable thing to say.
There are a number of other cool ways to look at this data, so I’ll probably write about it in the future. Feel free to use it and let me know what you find!