0

Example dataframe. enter image description here

I want to detect outliers per group and display it in a separate dataframe, for example, for each species name, anthopleura aureradiata, I want to look at values 27.75, 6.83, and 23.91, and calculate the outliers between these values. If I find that row 4 is an outlier for that particular species, I want to display it in my new dataframe. Does anyone know how to get about this?

Reproducible example:

x = data.frame("species" = c("Agao", "Beta", "Beta", "Beta", "Carrot", "Carrot"), "sum" = c(1, 100, 5, 4, 3, 0))
Vishesh Shrivastav
  • 2,079
  • 2
  • 16
  • 34
  • 1
    You can identify an outlier with `boxplot` `boxplot(x$sum, plot = FALSE)$out` – akrun Jan 23 '20 at 21:30
  • But that won't work because I need outliers PER group in my species column, so I need outliers for rows with the same species column. I think your method does it for every value in my dataframe – mexicanseafood Jan 23 '20 at 21:39
  • Just do a `group_by` operation – akrun Jan 23 '20 at 21:39
  • x %>% group_by(species) %>% boxplot(sum, plot = FALSE)$out , do you know why this wouldn't work? cheers – mexicanseafood Jan 23 '20 at 21:54
  • I would do `x %>% group_by(species) %>% mutate(i1 = sum %in% boxplot(sum, plot = FALSE)$out) %>% ungroup` – akrun Jan 23 '20 at 21:56
  • if the number of species are limited, you can create subset for each species (species1, species2, species3, etc.) and then do boxplot(species1$sum, species2$sum, species3$sum) – Aashay Mehta Jan 23 '20 at 22:03

1 Answers1

0

We can modify this function based on our requirement and use it to filter outliers for each group and create a new dataframe.

library(dplyr)

remove_outliers <- function(x, na.rm = TRUE, ...) {
    qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
    H <- 1.5 * IQR(x, na.rm = na.rm)
    x < (qnt[1] - H) | x > (qnt[2] + H)
}

separate_dataframe <- x %>% group_by(species) %>% filter(remove_outliers(sum))
separate_dataframe

# species   sum
#  <fct>   <dbl>
#1 Beta     -100

data

x = data.frame(species = c("Agao", "Beta", "Beta", "Beta", "Beta", 
              "Carrot", "Carrot"),sum = c(1, 1, 5, 4, -100, 3,0))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213