With the following example dataframe:
>>> df
col1 col2
0 b β
1 c β
2 a γ
3 e β
4 b α
5 d α
6 e β
7 c γ
8 a β
9 e β
What you’re doing to plot:
>>> sns.countplot(x='col1', hue='col2', data=df)
<AxesSubplot:xlabel='col1', ylabel='count'>
Now what your code is (roughly*) doing to compute the percentages is:
>>> 100 * df.value_counts() / len(df)
col1 col2
e β 30.0
a β 10.0
γ 10.0
b α 10.0
β 10.0
c β 10.0
γ 10.0
d α 10.0
dtype: float64
* Except you don’t have to compute these values yourself, the plotting function does it for you.
If we want per-category percentages, we should use groupby
to get the totals of each hue to divide value counts:
>>> 100 * df.value_counts() / df.value_counts().groupby('col2').transform('sum')
col1 col2
e β 50.000000
a β 16.666667
γ 50.000000
b α 50.000000
β 16.666667
c β 16.666667
γ 50.000000
d α 50.000000
dtype: float64
Note the use of .transform
and not df['col2'].value_counts()
directly.
Now the hard part is to match the values to their positions. Thankfully the matplotlib bar artists can give us that information:
hue_prc = (100 * df.value_counts() / df.value_counts().groupby('col2').transform('sum')).apply('{:.0f}%'.format)
ax = sns.countplot(x='col1', hue='col2', data=df)
for bars in ax.get_legend_handles_labels()[0]:
hue_label = bars.get_label()
for p, x_label in zip(bars.patches, [label.get_text() for label in ax.get_xticklabels()]):
x = p.get_bbox().get_points()[:,0].mean()
y = p.get_bbox().get_points()[1,1]
if pd.isna(y):
continue
ax.annotate(hue_prc.loc[x_label, hue_label], (x, y), ha='center', va='bottom', size=10)
This plots the following:
