0

I am drawing boxplots with Python Seaborn package. I have facet grid with both rows and columns. That much I've been able to do with the Seaborn function catplot.

I also want to annotate the outliers. I have found some nice examples at SO for annotating the outliers but without facet structure. That's where I'm struggling.

Here is what I've got (borrows heavily from this post: Boxplot : Outliers Labels Python):

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
sns.set_style('darkgrid')

Month = np.repeat(np.arange(1, 11), 10)
Id = np.arange(1, 101)
Value = np.random.randn(100)
Row = ["up", "down"]*50

df = pd.DataFrame({'Value': Value, 'Month': Month, 'Id': Id, 'Row': Row})

g = sns.catplot(data=df, x="Month", y="Value", row="Row", kind="box", height=3, aspect=3)

for name, group in df.groupby(["Month", "Row"]):
    fliers = [y for stat in boxplot_stats(group["Value"]) for y in stat["fliers"]]
    d = group[group["Value"].isin(fliers)]
    
    g.axes.flatten().annotate(d["Id"], xy=(d["Month"] - 1, d["Value"]))

The dataframe d collects all the outliers by patch. The last line aims to match d with the graph g patches. However, that doesn't work, but I haven't found a way to flatten axes to a list where each element would correspond to a grouped dataframe element.

I'd be glad to hear alternative versions for achieving this too.

Antti
  • 1,263
  • 2
  • 16
  • 28
  • Here are the docs where I'd start: https://seaborn.pydata.org/tutorial/axis_grids.html?highlight=custom%20function#using-custom-functions – Paul H Mar 03 '22 at 16:19

2 Answers2

1

One way to do it:

for name, group in df.groupby(["Month", "Row"]):
    fliers = [y for stat in boxplot_stats(group["Value"]) for y in stat["fliers"]]
    d = group[group["Value"].isin(fliers)]
    for i in range(len(d)):
        ngrid = (0 if d.iloc[i,3]=='up' else 1)
        g.fig.axes[ngrid].annotate(d.iloc[i, 2], xy=(d.iloc[i, 1] - 1, d.iloc[i, 0]))
rehaqds
  • 414
  • 2
  • 6
1

You can loop through g.axes_dict to visit each of the axes.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats

sns.set_style('darkgrid')

Month = np.repeat(np.arange(1, 11), 10)
Id = np.arange(1, 101)
Value = np.random.randn(100)
Row = ["up", "down"] * 50

df = pd.DataFrame({'Value': Value, 'Month': Month, 'Id': Id, 'Row': Row})

g = sns.catplot(data=df, x="Month", y="Value", row="Row", kind="box", height=3, aspect=3)

for row, ax in g.axes_dict.items():
    for month in np.unique(df["Month"]):
        group = df.loc[(df["Row"] == row) & (df["Month"] == month), :]
        fliers = boxplot_stats(group["Value"])[0]["fliers"]
        if len(fliers) > 0:
            for mon, val, id in zip(group["Month"], group["Value"], group["Id"]):
                if val in fliers:
                    ax.annotate(f' {id}', xy=(mon - 1, val))
plt.tight_layout()
plt.show()

sns.catplot annotating boxplot outliers

JohanC
  • 71,591
  • 8
  • 33
  • 66