8

I would like to annotate my violin plot with the number of observations in each group. So the question is essentially the same as this one, except:

  • python instead of R,
  • seaborn instead of ggplot, and
  • violin plots instead of boxplots

Lets take this example from Seaborn API documentation:

import seaborn as sns
sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips)

I'd like to have n=62, n=19, n=87, and n=76 on top of the violins. Is this doable?

posdef
  • 6,498
  • 11
  • 46
  • 94

2 Answers2

9

In this situation, I like to precompute the annotated values and incorporate them into the categorical axis. In other words, precompute e.g., "Thurs, N = xxx"

That looks like this:

import seaborn as sns
sns.set_style("whitegrid")
ax= (
    sns.load_dataset("tips")
       .assign(count=lambda df: df['day'].map(df.groupby(by=['day'])['total_bill'].count()))
       .assign(grouper=lambda df: df['day'].astype(str) + '\nN = ' + df['count'].astype(str))
       .sort_values(by='day') 
       .pipe((sns.violinplot, 'data'), x="grouper", y="total_bill")
       .set(xlabel='Day of the Week', ylabel='Total Bill (USD)')   
)

enter image description here

Paul H
  • 65,268
  • 20
  • 159
  • 136
3

You first need to store all values of y positions and x positions (using your dataset for that) in order to use ax.text, then a simple for loop can write everything in the positions desired:

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips)

yposlist = tips.groupby(['day'])['total_bill'].median().tolist()
xposlist = range(len(yposlist))
stringlist = ['n = 62','n = 19','n = 87','n = 76']

for i in range(len(stringlist)):
    ax.text(xposlist[i], yposlist[i], stringlist[i])

plt.show()

Vinícius Figueiredo
  • 6,300
  • 3
  • 25
  • 44
  • 1
    so the idea is to pre-calculate the x,y coordinates, and the number of observations in advance. Then just annotate them using `ax.text`? What if one would prefer to annotate above the plots? there's no guarantee there will be enough space within the violin to accommodate the text, especially if the number is large. – posdef Oct 16 '17 at 14:41
  • 1
    Other than labeling and creating a legend to the plots I think `.text` or `.annotate` are the only ways to do this, of course here I'm using a sample dataset, but with other dataset in hands I don't think it would be hard to get `"the x,y coordinates, and the number of observations"`. If you wish to write the texts above the plots you would need to get the the violins' max value and use it in `yposlist` instead. Like this `yposlist = tips.groupby(['day'])['total_bill'].max().tolist()` and then fine-adjust the y position to best fit the figure since this returns the dataset's max values. – Vinícius Figueiredo Oct 16 '17 at 15:17