1

I am trying to plot the distribution within a couple of dataframes I have. Doing it manually I get the result I am looking for:

#creating a dataframe
r = [0,1,2,3,4]
raw_data = {'greenBars': [20, 1.5, 7, 10, 5], 'orangeBars': [5, 15, 5, 10, 15],'blueBars': [2, 15, 18, 5, 10]}
df = pd.DataFrame(raw_data)

# From raw value to percentage
totals1 = list(df.sum(axis=1))
greenBars = [i / j * 100 for i,j in zip(df['greenBars'], totals)]
orangeBars = [i / j * 100 for i,j in zip(df['orangeBars'], totals)]
blueBars = [i / j * 100 for i,j in zip(df['blueBars'], totals)]
 
# plot
barWidth = 0.85
names = ('A','B','C','D','E')
# Create green Bars
plt.bar(df.index, greenBars, color='#b5ffb9', edgecolor='white', width=barWidth, label="group A")
# Create orange Bars
plt.bar(r, orangeBars, bottom=greenBars, color='#f9bc86', edgecolor='white', width=barWidth, label="group B")
# Create blue Bars
plt.bar(r, blueBars, bottom=[i+j for i,j in zip(greenBars, orangeBars)], color='#a3acff', edgecolor='white', width=barWidth, label="group C")
 
# Custom x axis
plt.xticks(r, names)
plt.xlabel("group")

# Add a legend
plt.legend(loc='upper left', bbox_to_anchor=(1,1), ncol=1)

# Show graphic
plt.show()

However I have to do this for multiple dataframes with more than just a few columns and would like to make a loop out of it. If have been able to draw the first bar completly but the other bars are incomplete with this code:

#Same df as above
for column in df:
  placeholder = [i / j * 100 for i,j in zip(df[column], totals)]
  print(f'placeholder of {column}')
  print(placeholder)
  barWidth = 0.85
  names = ('A','B','C','D','E')
  # Create green Bars
  plt.bar(df.index, placeholder, edgecolor='Black', width=barWidth, label = f"{column}")

Does anyone know how to fix this?

I tried creating the loop myself but the bars keept beeing incomplete.

0k4y
  • 11
  • 3
  • Does `ax = sns.histplot(df, multiple='fill', discrete=True); ax.set_xticks(r, names)` does more or less what you are looking for? – JohanC Apr 10 '23 at 13:43
  • Are `totals` and `totals1` the same variable? In that case, and without the conversion to lists, numpy's broadcasting magic would simplify the calculation to `greenBars = df['greenBars'] / totals * 100`. – JohanC Apr 10 '23 at 13:48
  • `df.index = ('A','B','C','D','E')` and then `ax = df.div(df.sum(axis=1), axis=0).mul(100).plot(kind='bar', stacked=True, rot=0)` – Trenton McKinney Apr 10 '23 at 23:41
  • Possible duplicates: https://stackoverflow.com/q/72856017/7758804 and https://stackoverflow.com/q/73410380/7758804 and https://stackoverflow.com/q/69846902/7758804 – Trenton McKinney Apr 11 '23 at 00:24
  • Hey Thanks for your help Johan C and Trenton McKinney. Totals1 and totlas was the same variable I just used different ways of getting to it (totals1 was a bit mor elegeant) – 0k4y Apr 11 '23 at 16:13

0 Answers0