1

I have a dataset which has some outliers and I would like annotate the bottom leg of the box plot in Python. Here is an example of what I face right now:

data = {'theta': [1 for i in range(0,10)],
       'error': [10,20,21,22,23,24,25,26,27,28]}

df = pd.DataFrame(data=data)
df

fig,ax1 = plt.subplots(figsize=(8,5))
box_plot = sns.boxplot(x="theta", y='error', data=df, ax = ax1, showfliers = False)
min_value = df.groupby(['theta'])['error'].min().values
for xtick in box_plot.get_xticks():
    idx = df[df['error']==min_value[xtick]].index.values
    text = 'The minimum value before outliers is here'
    box_plot.text(xtick,min_value[xtick]+2, text, 
            horizontalalignment='center',size='x-small',weight='semibold')
    box_plot.plot(xtick,min_value[xtick], marker='*', markersize=20 )

This does not produce what I want enter image description here

Instead, I would like to get this

enter image description here

Which I can get manually for this example, but I'd like a more systematic approach that I can generalize to other instances.

Rob
  • 241
  • 1
  • 14

1 Answers1

2

According to this answer, Seaborn boxplots use matplotlib to generate the whiskers and quartiles for the boxplot that is drawn, so you can perform these same calculations using matplotlib.cbook.boxplot_stats.

You can also modify the range of the yticks using ax1.set_yticks, in case you ever want to display the outlier value of 10.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from matplotlib.cbook import boxplot_stats

data = {'theta': [1 for i in range(0,10)],
       'error': [10,20,21,22,23,24,25,26,27,28]}

df = pd.DataFrame(data=data)
df

fig,ax1 = plt.subplots(figsize=(8,5))
box_plot = sns.boxplot(x="theta", y='error', data=df, ax = ax1, showfliers = False)
min_value = df.groupby(['theta'])['error'].min().values

## get the lower whisker
## you can retreive other boxplot values as well
low_whisker = boxplot_stats(df.error)[0]['whislo']

for xtick in box_plot.get_xticks():
    ## idx = df[df['error']==min_value[xtick]].index.values
    text = 'The minimum value before outliers is here'
    box_plot.text(xtick,low_whisker-2, text, 
            horizontalalignment='center',size='x-small',weight='semibold')
    box_plot.plot(xtick,low_whisker, marker='*', markersize=20)

## set the range to include the entire range of the data
ax1.set_yticks(np.arange(min(df.error),max(df.error)+5,5))

plt.show()

enter image description here

Derek O
  • 16,770
  • 4
  • 24
  • 43