I was recently trying to determine which city has the most comfortable climate. I came across the website WeatherSpark, which provides a lot of great information on annual temperature, relative humidity, cloud cover, and precipitation all summed up in some pretty nice graphics. I was trying to find the source of their data, but couldn’t find it on NOAA’s website.
But then I realized that the graphics themselves are a source of data – I just needed to count the pixels associated with each temperature band in the graphic to build a halfway decent dataset. Below, I use a Python imaging library called Pillow and images I scraped from WeatherSpark to find the US City with the most comfortable temperature throughout the year. The final code is here.
Getting the Images
This is an example image showing the fraction of time spent within certain temperature bands throughout the year. By counting the pixels associated with each color, I can find the percentage of time spent within each band annually.
The temperature images aren’t at predictable URLs, so I wrote this function to visit each city URL and find the URL for the temperature bands image. The cities that I used were the top 50 in the US by population, with Honolulu and Anchorage thrown in for geographic diversity.
Next, I use the requests library to download the images at each of the urls I scraped, and store them in the
Note that the code stores an intermediate list of cities and image urls at this point in
store.txt. If you rerun the code with
store.txt in the top directory, it skips the first two scraping steps and goes right to the processing steps that follow.
Processing the Images
After downloading the images, it’s time to process them. This is where the Python imaging library Pillow comes in . Pillow is a regularly updated fork of an earlier imaging library called PIL. Pillow has a number of great functions for processing and filtering images, including conversions, cropping and resizing. In this case, I use the
getcolors() method, which returns a list of tuples containing pixel counts grouped by RGB values (e.g.
[((140, 218, 92), 1826), ((76, 144, 80), 3384))] ). Here’s the image processing function:
One problem with this approach is that pixel RGB values can vary slightly, meaning you could end up with with two separate counts that should really be combined. For example,
((140, 218, 92), 1826), ((140, 217, 89), 4660) both code for a very similar light green color, with the blue value only varying by three (92 vs. 89).
To account for this, I compare the RGB values from each tuple with the correct temperature colors I selected with the Colorpick Eyedropper in advance. If the all of the RGB values are within 10 units of the predefined colors, the counts are added to that color count. This is the crucial part of the grouping function:
max(abs(j-tup), abs(j-tup), abs(j-tup)) < 10:. This is called the Infinity-norm technique, but there are a number of other methods for grouping similar colors together – see this stackoverflow discussion for more details.
The result of the above function is a dictionary like this one, with temperature bands as keys, and pixel counts and percentages as values:
Scoring the Cities
The final part of the above function scores the location based on the fraction of the time spent in different temperature bands. The scoring is somewhat subjective, although I did find an interesting survey showing that people generally find city temperatures below 62F too cold, 68-78F ideal, and anything above 86F too hot. I gave Comfortable a weight of 1.0, while Cool and Warm get weights of 0.70 and 0.80 respectively. I penalize time spent in bad temperature bands, with Sweltering and Frigid both getting weights of -1.0. The rest of the weights are in the
discounts dictionary above.
The final ranking of cities is below. Honolulu comes out on top, with 92 percent of the time spent in either Comfortable or Warm zones. The top of the rankings are filled out by cities in the West, Southwest and South. Milwaukee (my hometown), Minneapolis and Anchorage rank the lowest.
Of course, there are factors other than temperature like cloud cover, humidity, and precipitation that can make or break a climate. I’d imagine many of southern cities like Houston, New Orleans, or Jacksonville would fall in the rankings if the temperatures were corrected for humidity. But I might have to delve into the actual data to come up with a more comprehensive ranking, and the whole point of this post was to avoid doing so by counting pixels.
So it is possible to build a passable dataset by counting the pixels of some images! This approach would work for “integrating” other similar plots as long as the axes are uniform between images, and the total area of the plot corresponds with something meaningful. Cool beans.
 Installing Pillow can be kind of hard. There are non-Python dependencies like
little-cms2. The documentation recommends using homebrew to install these dependencies:
$ brew install libtiff libjpeg webp little-cms2, then
$ pip install Pillow. I use the Conda package manager, which is capable of installing non-Python dependencies. I ran
$ conda install -c anaconda pillow=3.2.0 [Info here], which installed Pillow along with the other requirements just fine.