10

I have a simple Data Frame that stores the results of a survey. The columns are:

| Age | Income | Satisfaction |

all of them contains values between 1 and 5 (categorical). I managed to generate a stacked barplot that shows distribution of Satisfaction values across people of different age. The code is:

#create a random df
data = []
for i in range(500):
    sample = {"age" : random.randint(0,5), "income" : random.randint(1,5), "satisfaction" : random.randint(1,5)}
data.append(sample)
df = pd.DataFrame(data)
#group by age
counter = df.groupby('age')['satisfaction'].value_counts().unstack()
#calculate the % for each age group 
percentage_dist = 100 * counter.divide(counter.sum(axis = 1), axis = 0)
percentage_dist.plot.bar(stacked=True)

This generates the following, desired, plot: enter image description here

However, it's difficult to compare if the green subset (percentage) of Age-0 is higher than the one in Age-2. Therefore, is there a way of adding the percentage on top of each sub-section of the barplot. Something like this, but for every single bar: enter image description here

Titus Pullo
  • 3,751
  • 15
  • 45
  • 65

1 Answers1

37

One option is to iterate over the patches in order to obtain their width, height and bottom-left coordinates and use this values to place the label at the center of the corresponding bar.

To do this, the axes returned by the pandas bar method must be stored.

ax = percentage_dist.plot.bar(stacked=True)
for p in ax.patches:
    width, height = p.get_width(), p.get_height()
    x, y = p.get_xy() 
    ax.text(x+width/2, 
            y+height/2, 
            '{:.0f} %'.format(height), 
            horizontalalignment='center', 
            verticalalignment='center')

Here, the annotated value is set to 0 decimals, but this can be easily modified.

The output plot generated with this code is the following:

enter image description here

OriolAbril
  • 7,315
  • 4
  • 29
  • 40
  • 1
    This is the best answer for me, including the linked posts. I added: ```if height==0: continue``` for cases where a catagory does not appear in a stack (which gives anomalous 0% lables at the bottom). – flashliquid May 24 '19 at 09:24
  • Glad it could be of help! I hadn't actually given much thought to this answer, I have edited to improve the label placement, now they are centered. – OriolAbril May 24 '19 at 21:41
  • 1
    You can change the format of the number to {:.0%} as well – Guy Aug 08 '20 at 14:55
  • 2
    Using this method how would you calculate the percentages when each column is of a different height (column patches don't add to 100 - don't represent percentages) – Daniel Kats Aug 25 '20 at 21:55
  • I was able to do this by enumerating the patches then using the index to index into the categories. However this doesn't feel clean... – Daniel Kats Aug 25 '20 at 22:07