I have a simple Data Frame that stores the results of a survey. The columns are:
| Age | Income | Satisfaction |
all of them contains values between 1
and 5
(categorical). I managed to generate a stacked barplot that shows distribution of Satisfaction
values across people of different age.
The code is:
#create a random df
data = []
for i in range(500):
sample = {"age" : random.randint(0,5), "income" : random.randint(1,5), "satisfaction" : random.randint(1,5)}
data.append(sample)
df = pd.DataFrame(data)
#group by age
counter = df.groupby('age')['satisfaction'].value_counts().unstack()
#calculate the % for each age group
percentage_dist = 100 * counter.divide(counter.sum(axis = 1), axis = 0)
percentage_dist.plot.bar(stacked=True)
This generates the following, desired, plot:
However, it's difficult to compare if the green
subset (percentage) of Age-0
is higher than the one in Age-2
. Therefore, is there a way of adding the percentage on top of each sub-section of the barplot. Something like this, but for every single bar: