I used this workaround to get x-coordinates of the outliers in the box plot axes, which I could use to label them as needed. The dataframe index is found by selecting the outliers in the same way which the sns box plot uses
import seaborn as sns
tips = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", hue="smoker",
data=tips, palette="Set3")
plt_outliers_xy = []
for line in ax.get_lines():
x_data,y_data = line.get_data()
if line.get_marker() != 'd' or len(y_data) == 0:
continue
for x_val,y_val in zip(x_data,y_data):
plt_outliers_xy.append((x_val,y_val))
grp = tips.groupby(['day','smoker'])
for name, df in grp:
print(name)
y_vals = df["total_bill"]
Q1 = y_vals.quantile(0.25)
Q3 = y_vals.quantile(0.75)
IQR = Q3 - Q1 #IQR is interquartile range.
iqr_filter = (y_vals >= Q1 - 1.5 * IQR) & (y_vals <= Q3 + 1.5 *IQR)
dropped = y_vals.loc[~iqr_filter]
for index,y_i in dropped.iteritems():
x_plt, y_plt = plt_outliers_xy.pop(0)
print(f"{index} : {y_i:.4f} - {y_plt:.4f} = {y_i-y_plt:.4f}")
# ax.plot(x_plt, y_plt,'ro')
ax.annotate(f"{index}",(x_plt, y_plt),(10,10), textcoords = 'offset pixels')
print()
The outliers per grouped data can be obtained with:
https://datascience.stackexchange.com/questions/54808/how-to-remove-outliers-using-box-plot
Or:
Extract outliers from Seaborn Boxplot
Or:
https://nextjournal.com/schmudde/how-to-remove-outliers-in-data
The plot result:
Seaborn box plot with annotated outliers