How to find outliers within groups in a dataframe

Question

I have a df which looks like the following:

I would like to add a column onto this which specifies if the value is an outlier. If there were no groups then I would use something like:

df['outliers'] = df[df[col] > df[col].mean() + 3 * df[col].std()]

But how would I do this so it is within the groups?

look into the where clause for pandas. https://www.geeksforgeeks.org/python-pandas-dataframe-where/ — Justin Oberle, Apr 26 '21 at 13:54
Does this answer your question? [Checking a Pandas Dataframe for Outliers](https://stackoverflow.com/questions/48087534/checking-a-pandas-dataframe-for-outliers) — Irfan Bilir, Apr 26 '21 at 13:59
Almost but not quite. Because I have different groups I need to compare each value against the mean of that group, not against the mean of the whole column. — Barney Cooper, Apr 26 '21 at 14:05

Mustafa Aydın · Accepted Answer · 2021-04-26T14:16:25.980

3

df["is_outlier"] = df.groupby("Group.").transform(lambda x: (x - x.mean()).abs() > 3*x.std())

In each group, we take the distance of elements from the group mean and see if its absolute value exceeds 3 times std of the group.

edited Apr 26 '21 at 14:16

answered Apr 26 '21 at 14:04

Mustafa Aydın

1 Answers1