0

I have dataframe which is in below form:

data = [['M',0],['F',0],['M',1], ['M',1],['M',1],['F',1],['M',0], ['M',1],['M',0],['F',1],['M',0], ['M',0]]
df = pd.DataFrame(data,columns=['Gender','label'])
print (df)
  Gender  label
0       M      0
1       F      0
2       M      1
3       M      1
4       M      1
5       F      1
6       M      0
7       M      1
8       M      0
9       F      1
10      M      0
11      M      0

I am trying to create a stacked bar chart which should percentage as the annotation on the chart. Code below to create stacked bar chart:

df.groupby('Gender')['label']\
    .value_counts()\
    .unstack(level=1)\
    .plot.bar(stacked=True)

enter image description here

I am not sure how to get percentages on the chart.

Thanks ina dvance

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
user15051990
  • 1,835
  • 2
  • 28
  • 42

1 Answers1

3

I can offer you this solution:

I have created a new DataFrame,df2, that contains the percentages that need to be painted.

The values ​​of df2, have been ordered to correspond correctly with the index i that refers to the different bars. This allows you to paint each value in the right place.

get_xy obtains the x and y coordinates of the bottom corner of each bar.

get_width gets the width of each bar.

get_height gets the length of each bar.

To paint the percentages a loop is used. Each turn of a loop refers to a bar. The center of each bar is half the width and length. kx and ky are used to slightly correct the position.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
data = [['M', 0], ['F', 0], ['M',1 ], ['M', 1], ['M', 1], ['F', 1], ['M', 0], ['M', 1], ['M', 0], ['F', 1], ['M', 0], ['M', 0]]
df = pd.DataFrame(data,columns=['Gender','label'])
F_Serie = df.groupby('Gender')['label'].value_counts()['F']
M_Serie = df.groupby('Gender')['label'].value_counts()['M']
M_Serie = M_Serie*(100/M_Serie.sum())
F_Serie = F_Serie*(100/F_Serie.sum())
df2 = pd.DataFrame(np.array([list(F_Serie), list(M_Serie)]), index = ['F', 'M'], columns = [0, 1])

ax = df.groupby('Gender')['label'].value_counts().unstack(level=1).plot.barh(stacked=True, figsize=(10, 6))
# Set txt
kx = -0.3
ky = -0.02

values = []
for key in df2.values:
    values = values + list(key)
# ordering the values    
val = values[1:3]
values.pop(1)
values.pop(1)
values = val + values


for i,rec in enumerate(ax.patches):
    ax.text(rec.get_xy()[0]+rec.get_width()/2+kx,rec.get_xy()[1]+rec.get_height()/2+ky,'{:.1%}'.format(values[i]/100), fontsize=12, color='black')

enter image description here

cmaher
  • 5,100
  • 1
  • 22
  • 34
ansev
  • 30,322
  • 5
  • 17
  • 31
  • Thanks for the solution, but graph is not showing % marks on the chart. Any idea how to do that? – user15051990 Aug 12 '19 at 01:32
  • What does 40, 80 indicates in the ax.text()? – user15051990 Aug 12 '19 at 14:42
  • the first two fields correspond to the x and y coordinates referring to the figure where you want to write the text, therefore 40 and 80 refer to y coordinates – ansev Aug 12 '19 at 14:51
  • I have added another option to set the position of the text automatically, I use get_xy to get the x and y coordinates of the bottom point of each rectangle, you can check this here: https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.patches.Rectangle.html Finally, add to each coordinate the width and height respectively and adjust this position using kx and ky. You can try to vary kx and ky. – ansev Aug 12 '19 at 15:25
  • Thanks for the solution, is it possible to do something like this https://stackoverflow.com/questions/51495982/display-totals-and-percentage-in-stacked-bar-chart-using-dataframe-plot. – user15051990 Aug 12 '19 at 18:00
  • Do you want horizontal or vertical bars? Do the axes units want them in percentages? – ansev Aug 12 '19 at 18:17
  • I want horizontal bar and want to show percentages on the bars as shown in the above url. – user15051990 Aug 12 '19 at 18:59
  • I really appreciate your efforts you are putting in. Thanks a lot. – user15051990 Aug 12 '19 at 19:37
  • I have updated the code and the image. Tell me if you need any modification! – ansev Aug 12 '19 at 21:17
  • What was the mistake? – user15051990 Aug 12 '19 at 22:06
  • I had to `pop (1)` twice because the array positions are updated. A value was repeated in the values ​​list. I have updated the code too. – ansev Aug 12 '19 at 22:09