Easy Stacked Charts with Matplotlib and Pandas

Creating stacked bar charts using Matplotlib can be difficult. Often the data you need to stack is oriented in columns, while the default Pandas bar plotting function requires the data to be oriented in rows with a unique column for each layer.

Below is an example dataframe, with the data oriented in columns. In this case, we want to create a stacked plot using the Year column as the x-axis tick mark, the Month column as the layers, and the Value column as the height of each month band.

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib

matplotlib.style.use('ggplot')


data = [[2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002],
        ['Jan', 'Feb', 'Mar', 'Jan', 'Feb', 'Mar', 'Jan', 'Feb', 'Mar'],
        [1, 2, 3, 4, 5, 6, 7, 8, 9]]

rows = zip(data[0], data[1], data[2])
headers = ['Year', 'Month', 'Value']
df = pd.DataFrame(rows, columns=headers)

df
Year Month Value
0 2000 Jan 1
1 2000 Feb 2
2 2000 Mar 3
3 2001 Jan 4
4 2001 Feb 5
5 2001 Mar 6
6 2002 Jan 7
7 2002 Feb 8
8 2002 Mar 9


Iterative Solution

I have seen a few solutions that take a more iterative approach, creating a new layer in the stack for each category. This is accomplished by using the same axis object ax to append each band, and keeping track of the next bar location by cumulatively summing up the previous heights with a margin_bottom array.

fig, ax = plt.subplots(figsize=(10,7))  

months = df['Month'].drop_duplicates()
margin_bottom = np.zeros(len(df['Year'].drop_duplicates()))
colors = ["#006D2C", "#31A354","#74C476"]

for num, month in enumerate(months):
    values = list(df[df['Month'] == month].loc[:, 'Value'])

    df[df['Month'] == month].plot.bar(x='Year',y='Value', ax=ax, stacked=True, 
                                    bottom = margin_bottom, color=colors[num], label=month)
    margin_bottom += values

plt.show()

png


Using a Pivot

The above approach works pretty well, but there has to be a better way. After a little bit of digging, I found a better solution using the Pandas pivot function.

The pivot function takes arguments of index (what you want on the x-axis), columns (what you want as the layers in the stack), and values (the value to use as the height of each layer). Note that there needs to be a unique combination of your index and column values for each number in the values column in order for this to work.

The end result is a new dataframe with the data oriented so the default Pandas stacked plot works perfectly.

pivot_df = df.pivot(index='Year', columns='Month', values='Value')
pivot_df
Month Feb Jan Mar
Year
2000 2 1 3
2001 5 4 6
2002 8 7 9
#Note: .loc[:,['Jan','Feb', 'Mar']] is used here to rearrange the layer ordering
pivot_df.loc[:,['Jan','Feb', 'Mar']].plot.bar(stacked=True, color=colors, figsize=(10,7))

png