1

I would like to have a horizontal stacked bar chart with hierarchy labels on y axis. I have searched a bit, and found the following nice example and code.

But it is for a vertical stacked bar chart. I want to apply it to a horizontal bar chart, so I simply changed kind='barh', but this won't work.

I managed to delete the default ylabels by changing all x to y in the last few lines. But changing x to y in the functions defined didn't give me what I want: the hierarchy labels are still on x axis.

Can anyone help? Thanks.

P.S.: to make things less messy, I posted the original code I found from the 2nd answer to this question rather than the one I tried to modify

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from itertools import groupby

def test_table():
data_table = pd.DataFrame({'Room': ['Room A'] * 4 + ['Room B'] * 3,
                       'Shelf': ['Shelf 1'] * 2 + ['Shelf 2'] * 2 + ['Shelf 1'] * 2 + ['Shelf 2'],
                       'Staple':['Milk', 'Water', 'Sugar', 'Honey', 'Wheat', 'Corn', 'Chicken'],
                       'Quantity': [10, 20, 5, 6, 4, 7, 2,],
                       'Ordered': np.random.randint(0, 10, 7)
                       })
data_table
def add_line(ax, xpos, ypos):
line = plt.Line2D([xpos, xpos], [ypos + .1, ypos],
                  transform=ax.transAxes, color='black')
line.set_clip_on(False)
ax.add_line(line)

def label_len(my_index,level):
labels = my_index.get_level_values(level)
return [(k, sum(1 for i in g)) for k,g in groupby(labels)]

def label_group_bar_table(ax, df):
ypos = -.1
scale = 1./df.index.size
for level in range(df.index.nlevels)[::-1]:
    pos = 0
    for label, rpos in label_len(df.index,level):
        lxpos = (pos + .5 * rpos)*scale
        ax.text(lxpos, ypos, label, ha='center', transform=ax.transAxes)
        add_line(ax, pos*scale, ypos)
        pos += rpos
    add_line(ax, pos*scale , ypos)
    ypos -= .1

df = test_table().groupby(['Room','Shelf','Staple']).sum()
fig = plt.figure()
ax = fig.add_subplot(111)
df.plot(kind='bar',stacked=True,ax=fig.gca())

#Below 3 lines remove default labels
labels = ['' for item in ax.get_xticklabels()]
ax.set_xticklabels(labels)
ax.set_xlabel('')
label_group_bar_table(ax, df)
fig.subplots_adjust(bottom=.1*df.index.nlevels)
plt.show()
xiaoshir
  • 215
  • 4
  • 17

1 Answers1

1

You can do something like this.

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import pandas as pd
import numpy as np

data_table = pd.DataFrame({'Room': ['Room A'] * 4 + ['Room B'] * 3,
                           'Shelf': ['Shelf 1'] * 2 + ['Shelf 2'] * 2 + ['Shelf 1'] * 2 + ['Shelf 2'],
                           'Staple': ['Milk', 'Water', 'Sugar', 'Honey', 'Wheat', 'Corn', 'Chicken'],
                           'Quantity': [10, 20, 5, 6, 4, 7, 2, ],
                           'Ordered': np.random.randint(0, 10, 7)
                           })

arrays = [list(data_table['Room']), list(data_table['Shelf']), list(data_table['Staple'])]
data_table = data_table.groupby(['Room', 'Shelf', 'Staple']).sum()
index = pd.MultiIndex.from_tuples(list(zip(*arrays)))

df = pd.DataFrame(data_table[['Ordered', 'Quantity']], index=index).T

# plotting
fig = plt.figure()
height_ratios = [len(df[df.columns.levels[0][0]].columns), len(df[df.columns.levels[0][1]].columns)] #i.e. 4, 3
gs = gridspec.GridSpec(nrows=len(df.columns.levels[0]), ncols=1, height_ratios=height_ratios)

ax1 = fig.add_subplot(gs[0,0])
ax2 = fig.add_subplot(gs[1,0], sharex=ax1)
axes = [ax1, ax2]
for i, col in enumerate(df.columns.levels[0]):
    print(col)
    ax = axes[i]
    df[col].T.plot(ax=ax, stacked=True, kind='barh', width=.8)

    ax.legend_.remove()
    ax.set_ylabel(col, weight='bold')
    ax.xaxis.grid(b=True, which='major', color='black', linestyle='--', alpha=.4)
    ax.set_axisbelow(True)

    for tick in ax.get_xticklabels():
        tick.set_rotation(0)

ax.legend()
# make the ticklines invisible
ax.tick_params(axis=u'both', which=u'both', length=0)
plt.tight_layout()
# remove spacing in between
fig.subplots_adjust(wspace=0)  # space between plots

plt.show()

enter image description here

I adapted a previous answer of mine. Note that the hierarchy grouping is apparently on the wishlist, as such, this is done manually here.

Chris
  • 1,287
  • 12
  • 31
  • Thanks Chris for the help. But my real df has more levels , say like this: `arrays = [['Fruit', 'Fruit', 'Fruit', 'Veggies', 'Veggies', 'Veggies','Fruit', 'Fruit', 'Fruit', 'Veggies'], ['Bananas', 'Oranges', 'Pears', 'Carrots', 'Potatoes', 'Celery','Bananas', 'Oranges', 'Pears', 'Carrots'], ['A','B','C','D','E','nan','G','H','I','J'], ['a','b','c','d','e','f','nan','h','i','nan']] index = pd.MultiIndex.from_tuples(list(zip(*arrays))) df = pd.DataFrame(np.random.randint(10, 50, size=(4, 10)), columns=index) df.sort_index(axis=1,inplace=True)`. – xiaoshir Aug 06 '18 at 13:27
  • My questions are: 1. how can I show the other levels also in hierarchy? 2. how can I make the thickness of the bar automatically calculated to be equal? 3. how can get rid of `nan`, and not showing them in the graph? – xiaoshir Aug 06 '18 at 13:29
  • I've updated my answer based on your initial data_table (should have done that straight off). I've manipulated your data_table to shape it into a 'multiindex' dataframe. 1. Extra levels are based on the first level, see `df.columns.levels[0]`, the number of graphs is now based on that as well. 2. I'm not entirely sure what you mean by calculated to be equal? 3. Are you saying you have NaN's in your labels? or in your numeric values? If numeric, you can easily drop columns/rows if NaNs are present. see pandas' `dropna` – Chris Aug 07 '18 at 19:17
  • 2. Meaning the height of each figure is adjusted based on the number of bars shown, because I don't always have 4 bars in each figure. I have edited the `data_table` in the question to reflect this. – xiaoshir Aug 09 '18 at 11:27
  • I see, this is also possible. You would need to use something like `gridspec`, which you can then feed specific `height_ratios`, in this case 4 and 3 to reflect the items to plot. See updated answer. – Chris Aug 09 '18 at 21:45