I am new to R. It might be a silly question, but I am having a hard time. I am trying to figure out the outliers repeatedly for a column. I followed this How to repeat the Grubbs test and flag the outliers and got the expected results. But I want to display the p-value as well as a third column in the data frame. I tried a couple of things, but I got a single value repeatedly or only two p-values. How can I display all p-value?
Asked
Active
Viewed 388 times
1 Answers
1
You can make the following changes to the function -
library(outliers)
library(ggplot2)
X <- c(152.36,130.38,101.54,96.26,88.03,85.66,83.62,76.53,
74.36,73.87,73.36,73.35,68.26,65.25,63.68,63.05,57.53)
grubbs.flag <- function(x) {
outliers <- NULL
test <- x
grubbs.result <- grubbs.test(test)
result <- data.frame(X=x)
i <- 1
result$pv[i] <- grubbs.result$p.value
while(result$pv[i] < 0.05) {
outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))
test <- x[!x %in% outliers]
grubbs.result <- grubbs.test(test)
i <- i + 1
result$pv[i] <- grubbs.result$p.value
}
return(data.frame(result,Outlier=(x %in% outliers)))
}
grubbs.flag(X)
# X pv Outlier
#1 152.36 0.012614743 TRUE
#2 130.38 0.007648407 TRUE
#3 101.54 0.237267039 FALSE
#4 96.26 0.012614743 FALSE
#5 88.03 0.012614743 FALSE
#6 85.66 0.012614743 FALSE
#7 83.62 0.012614743 FALSE
#8 76.53 0.012614743 FALSE
#9 74.36 0.012614743 FALSE
#10 73.87 0.012614743 FALSE
#11 73.36 0.012614743 FALSE
#12 73.35 0.012614743 FALSE
#13 68.26 0.012614743 FALSE
#14 65.25 0.012614743 FALSE
#15 63.68 0.012614743 FALSE
#16 63.05 0.012614743 FALSE
#17 57.53 0.012614743 FALSE

Ronak Shah
- 377,200
- 20
- 156
- 213
-
1This code is giving the same p-values. p-values for data 1 and data 4-17 are the same. Then why 1 is an outlier and the rest of them are not. I think it is showing the wrong p-values. – Backbencher Jun 01 '21 at 05:09