0

I am new to R. It might be a silly question, but I am having a hard time. I am trying to figure out the outliers repeatedly for a column. I followed this How to repeat the Grubbs test and flag the outliers and got the expected results. But I want to display the p-value as well as a third column in the data frame. I tried a couple of things, but I got a single value repeatedly or only two p-values. How can I display all p-value?

enter image description here

1 Answers1

1

You can make the following changes to the function -

library(outliers)
library(ggplot2)

X <- c(152.36,130.38,101.54,96.26,88.03,85.66,83.62,76.53,
       74.36,73.87,73.36,73.35,68.26,65.25,63.68,63.05,57.53)

grubbs.flag <- function(x) {
  outliers <- NULL
  test <- x
  grubbs.result <- grubbs.test(test)
  result <- data.frame(X=x)
  i <- 1
  result$pv[i] <- grubbs.result$p.value
  while(result$pv[i] < 0.05) {
    outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))
    test <- x[!x %in% outliers]
    grubbs.result <- grubbs.test(test)
    i <- i + 1
    result$pv[i] <- grubbs.result$p.value
  }
  return(data.frame(result,Outlier=(x %in% outliers)))
}

grubbs.flag(X)

#        X          pv Outlier
#1  152.36 0.012614743    TRUE
#2  130.38 0.007648407    TRUE
#3  101.54 0.237267039   FALSE
#4   96.26 0.012614743   FALSE
#5   88.03 0.012614743   FALSE
#6   85.66 0.012614743   FALSE
#7   83.62 0.012614743   FALSE
#8   76.53 0.012614743   FALSE
#9   74.36 0.012614743   FALSE
#10  73.87 0.012614743   FALSE
#11  73.36 0.012614743   FALSE
#12  73.35 0.012614743   FALSE
#13  68.26 0.012614743   FALSE
#14  65.25 0.012614743   FALSE
#15  63.68 0.012614743   FALSE
#16  63.05 0.012614743   FALSE
#17  57.53 0.012614743   FALSE
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    This code is giving the same p-values. p-values for data 1 and data 4-17 are the same. Then why 1 is an outlier and the rest of them are not. I think it is showing the wrong p-values. – Backbencher Jun 01 '21 at 05:09