I have a Pandas dataframe that I am trying to remove outliers from on a group by group basis. Each row in a group is considered an outlier the value of a column if it is outside the range of
[group_mean - (group_std_dev * 3), group_mean + (group_std_dev * 3)]
where group_mean is the average value of the column in the group, and group_std_dev is the standard deviation of the column for the group. I tried the following Pandas chain
df.groupby(by='group').apply(lambda x: x[(x['col'].mean() - (x['col'].std() * 3)) < x['col'] < (x['col'].mean() - (x['col'].std() * 3)])
but it does not appear the work as Pandas throws the following error for the comparison inside apply
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The error does not appear to make much sense to me because the comparison should convert to a Series of bools, which then is applied to the group x?
However filtering by just the upper or lower bound does work, like
df.groupby(by='group').apply(lambda x: x[(x['col'].mean() - (x['col'].std() * 3)) < x['col'])
but I am unsure of how to chain these together.
Does anyone have any ideas on how to simply & cleanly implement this? It doesn't appear very hard to me, but other posts on here have not yielded a satisfactory or working answer.