I'm currently working on removing outliers and I'm using Klodian Dhana's function on outlier subject (https://datascienceplus.com/identify-describe-plot-and-removing-the-outliers-from-the-dataset/#comment-3592066903).
My dataset consists of 95000 observations divided into 1050 groups, and I'm wondering if there is a way to check the outliers by the group, and not going for the formula 1050 times.
Data(DF)
Group Height
Gr1 2
Gr1 5
Gr1 5
Gr2 75
Gr2 72
Gr2 44
Gr3 4
Gr3 25
Gr3 42
… …
Gr1050 43
So I would like to check the outlier formula by the group, but to have it in a single DF.
I'm not very expert so I did my research and found that the by()
, ddply()
, and tapply()
functions could be used in this case. I think also that loops could be useful.