0

Using plot.barh creates a bar chart with equally spaced columns. However, I have a column with unequally spaced values (df1['dist']) which I would like to use in the plot to provide additional information:

df1 = pd.DataFrame(np.random.rand(5, 2), columns=['a', 'b'])
df1['dist'] = pd.Series([1,5,6.5,15,45], index=df1.index)

df1.plot.barh(['dist'],['a','b'],stacked=True, width=.6, color = ['y','b'])
plt.show()

Is that possible?

Thomas Kühn
  • 9,412
  • 3
  • 47
  • 63
mati
  • 1,093
  • 4
  • 12
  • 18

1 Answers1

2

You can create the bar chart 'by hand' using the barh function from matplotlib:

import pandas as pd
from matplotlib import pyplot as plt
import numpy as np

df1 = pd.DataFrame(np.random.rand(5, 2), columns=['a', 'b'])
df1['dist'] = pd.Series([1,5,6.5,15,45], index=df1.index)

fig,ax = plt.subplots()

ax.barh(df1['dist'],df1['a'],height =1)
ax.barh(df1['dist'],df1['b'],left=df1['a'], height =1)
plt.show()

Here is the result:

enter image description here

I'm not sure if this actually looks better, as now the bars are quite thin. However, you can adjust them with the height parameter.

Thomas Kühn
  • 9,412
  • 3
  • 47
  • 63
  • Thanks, it works fine for a set of 2 series. However, if I increase to more series I get some weird results - probably I misunderstood the concept? Here is an example: `df1 = pd.DataFrame(np.random.rand(5, 4), columns=['a', 'b' ,'c', 'd']) df1['dist'] = pd.Series([1,5,6.5,15,45], index=df1.index) ax.barh(df1['dist'],df1['a'],height =1, color = 'r') ax.barh(df1['dist'],df1['b'],left=df1['a'], height =1, color = 'g') ax.barh(df1['dist'],df1['c'],left=df1['b'], height =1, color = 'b') ax.barh(df1['dist'],df1['d'],left=df1['c'], height =1, color = 'k')` – mati Aug 03 '17 at 06:31
  • Found a solution for my last comment [here:](https://stackoverflow.com/a/16654564/4053508) – mati Aug 03 '17 at 08:27
  • @mati the problem you face is with the `left` keyword, which tells `barh` where to start the bar. For the second column, `left` is just the value of the first column, but for the third column you would have to use the sum of the values of the first and second column. If you have more than three columns it's of course best to do this in a loop and store the running sums of `left` values in a dedicated list, just like in one of the answers to the question you linked. – Thomas Kühn Aug 03 '17 at 08:50