1

I want to remove outliers from a variable MEASURE after grouping by TYPE. I tried the following code but it didn't work. I've searched and I've only came across how to remove outliers for the whole dataframe or one column. But not by after grouping.

df2 <- df %>%
  group_by(TYPE) %>%
  mutate(MEASURE_WITHOUT_OUTLIERS = remove_outliers(MEASURE))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
DSan
  • 59
  • 6
  • From which package is `remove_outliers` function from? Can you show the code which works to remove outliers from whole dataframe or one column? – Ronak Shah Jun 12 '21 at 01:14
  • Hi, I tried finding the package for remove_outilers and couldn't find it. I tried the following and the df came with 0 observations, which can't be. library(rstatix) df_no_outliers <- df %>% group_by(TYPE) %>% identify_outliers(MEASURE) %>% filter(!is.outlier) – DSan Jun 12 '21 at 23:43

1 Answers1

0

You can use boxplot.stats to get outlier values in each group and use filter to remove them.

library(dplyr)

df2 <- df %>%
  group_by(TYPE) %>%
  filter(!MEASURE %in% boxplot.stats(MEASURE)$out) %>%
  ungroup
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you! Do you know what approach does boxplots.stats use to detect outliers? I would want to rreport the method. Also, what does ungroup does in this code? – DSan Jun 14 '21 at 21:13
  • As our data is grouped it is a good practice to `ungroup` the data after the work is done. Although, not doing it will not change anything. You can read this post https://stackoverflow.com/questions/27036134/how-exactly-are-outliers-removed-in-r-boxplot-and-how-can-the-same-outliers-be-r to read more about it. – Ronak Shah Jun 15 '21 at 02:52