-1

Below this is code. Cehck thanks

fig, axes =  plt.subplots(nrows=2, ncols=2, figsize=(10,10))

ax1 = sns.countplot(x='col1', data=df, hue='col2', ax = axes[0,0])


ncount = len(df)

for p in ax1.patches:
    x=p.get_bbox().get_points()[:,0]
    y=p.get_bbox().get_points()[1,1]
    ax1.annotate('{:.1f}%'.format(100.*y/ncount), (x.mean(), y), 
            ha='center', va='bottom', size=10) 

ncount = len(df)ncount = len(df)

johnJones901
  • 47
  • 1
  • 6
  • You should edit your question to be a [mcve] that includes some example data. – Alex Jul 09 '21 at 10:41
  • “*the subplot ax1 displaying percentage of each bar as percentage of 100%*” that’s not right, it’s simply displaying the counts. Compute the percentages yourself and use `barplot` instead. – Cimbali Jul 09 '21 at 10:50
  • @johnJones901 The figure you posted definitely does not do that. The countplot function documentation says « Show the counts of observations in each categorical bin using bars. ». It’s not clear to me where you’re getting percentages from. – Cimbali Jul 09 '21 at 10:54
  • 1
    @johnJones901 if you want help, please don’t post confusing figures that have nothing to do with your data, and [add a sample of your data](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). Also, the code plots the counts as I’ve said before (it’s literally in the name of the function!). Maybe some counts happen to sum to 100? You’re even annotating the percentage `100*y/ncount` at the vertical position `y`. If `y` was already percentages you would annotate `y` or `100*y`. – Cimbali Jul 09 '21 at 11:10
  • So you want to keep countplots but only change annotations? – Cimbali Jul 09 '21 at 11:18
  • You have removed a lot from your post and now it does not even contain an actual question anymore. Consider undoing your last edit and improving more carefully. – Yunnosch Jul 13 '21 at 06:58

1 Answers1

1

With the following example dataframe:

>>> df
  col1 col2
0    b    β
1    c    β
2    a    γ
3    e    β
4    b    α
5    d    α
6    e    β
7    c    γ
8    a    β
9    e    β

What you’re doing to plot:

>>> sns.countplot(x='col1', hue='col2', data=df)
<AxesSubplot:xlabel='col1', ylabel='count'>

Now what your code is (roughly*) doing to compute the percentages is:

>>> 100 * df.value_counts() / len(df)
col1  col2
e     β       30.0
a     β       10.0
      γ       10.0
b     α       10.0
      β       10.0
c     β       10.0
      γ       10.0
d     α       10.0
dtype: float64

* Except you don’t have to compute these values yourself, the plotting function does it for you.

If we want per-category percentages, we should use groupby to get the totals of each hue to divide value counts:

>>> 100 * df.value_counts() / df.value_counts().groupby('col2').transform('sum')
col1  col2
e     β       50.000000
a     β       16.666667
      γ       50.000000
b     α       50.000000
      β       16.666667
c     β       16.666667
      γ       50.000000
d     α       50.000000
dtype: float64

Note the use of .transform and not df['col2'].value_counts() directly.

Now the hard part is to match the values to their positions. Thankfully the matplotlib bar artists can give us that information:

hue_prc = (100 * df.value_counts() / df.value_counts().groupby('col2').transform('sum')).apply('{:.0f}%'.format)

ax = sns.countplot(x='col1', hue='col2', data=df)
for bars in ax.get_legend_handles_labels()[0]:
    hue_label = bars.get_label()
    for p, x_label in zip(bars.patches, [label.get_text() for label in ax.get_xticklabels()]):
        x = p.get_bbox().get_points()[:,0].mean()
        y = p.get_bbox().get_points()[1,1]
        if pd.isna(y):
            continue
        ax.annotate(hue_prc.loc[x_label, hue_label], (x, y), ha='center', va='bottom', size=10)

This plots the following:

enter image description here

Cimbali
  • 11,012
  • 1
  • 39
  • 68
  • 1
    @johnJones901 just use the height of the patch instead for `y`. See my edit. It might not be `nan` as expected, in which case you’ll need to wrap `hue_prc.loc[x_label, hue_label]` in a try-block to ignore `KeyError`s – Cimbali Jul 09 '21 at 12:43
  • 1
    @johnJones901 The datavalues attribute requires matplotlib >= 3.4.0. You should update your matplotlib installation – Alex Jul 09 '21 at 13:59
  • @johnJones901 just use `col1` in the groupy – Cimbali Jul 09 '21 at 18:27