1

I want to overlay 95 percentile values on seaborn boxplot. I could not figure out the ways to overlay text or if there is seaborn capability for that. How would I modify following code to overlay the 95 percentile values on plot.

import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.randn(200, 4), columns=list('ABCD'))*100
alphabet = list('AB')
df['Gr'] = np.random.choice(np.array(alphabet, dtype="|S1"), df.shape[0])
df_long = pd.melt(df, id_vars=['Gr'], value_vars = ['A','B','C','D'])
sns.boxplot(x = "variable", y="value", hue = 'Gr',  data=df_long, whis = [5,95])
user1430763
  • 37
  • 1
  • 8
  • This post should help you to solve this https://stackoverflow.com/questions/38649501/labeling-boxplot-in-seaborn-with-median-value. Just replace the median computation to compute the 95 percentile. – Asamoah Jun 15 '18 at 21:05

1 Answers1

1

Consider seaborn's plot.text, borrowing from @bernie's answer (also a healty +1 for including sample dataset). The only challenge is adjusting the alignment due to grouping in hue field to have labels overlay over each boxplot series. Even have labels color coded according to series.

import pandas as pd
import numpy as np
import seaborn as sns

np.random.seed(61518)
# ... same as OP

# 95TH PERCENTILE SERIES
pctl95 = df_long.groupby(['variable', 'Gr'])['value'].quantile(0.95)
pctl95_labels = [str(np.round(s, 2)) for s in pctl95]

# GROUP INDEX TUPLES
grps = [(i, 2*i, 2*i+1) for i in range(4)]
# [(0,0,1), (1,2,3), (2,4,5), (3,6,7)]

pos = range(len(pctl95))

# ADJUST HORIZONTAL ALIGNMENT WITH MORE SERIES
for tick, label in zip(grps, hplot.get_xticklabels()):
    hplot.text(tick[0]-0.1, pctl95[tick[1]] + 0.95, pctl95_labels[tick[1]], 
               ha='center', size='x-small', color='b', weight='semibold')

    hplot.text(tick[0]+0.1, pctl95[tick[2]] + 0.95, pctl95_labels[tick[2]], 
               ha='center', size='x-small', color='g', weight='semibold')
sns.plt.show()

Plot Output

Parfait
  • 104,375
  • 17
  • 94
  • 125