R: bugs in flagging outliers (how R recognizes the length of a infinite decimal)

Question

I got a problem running the flowing code:

library("outliers")

#flags the outliers
grubbs.flag <- function(x) {
  outliers <- NULL
  test <- x
  grubbs.result <- grubbs.test(test)
  pv <- grubbs.result$p.value
  while(pv < 0.05) {
    outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))
    test <- x[!x %in% outliers]
    grubbs.result <- grubbs.test(test)
    pv <- grubbs.result$p.value
  }
  return(data.frame(X=x,Outlier=(x %in% outliers)))
}

# make a vector consists of infinite decimals as an example
a=c(1,5,7,9,110)
b=c(3,3,3,3,3)
x=a/b
grubbs.flag(x)

The code originally comes from How to repeat the Grubbs test and flag the outliers

If vector x consist of infinite decimals, there might be an error occurred in test <- x[!x %in% outliers], when a outlier exists.

In test <- x[!x %in% outliers] the infinite decimal outliers is not recognized as an element of x, and drops into an end less loop. the reason might be the length of the outliers in x differed from the length of outliers

So I'm curious how R recognize the length of a infinite decimal vector, and how to deal with this problem.

The provided code is crashing against the [FAQ 7.31](http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f) wall. You probably need to introduce some rounding when you create "test". — , Jul 03 '15 at 05:09
See also [why aren't these numbers equal](http://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal) for things to know when comparing decimal values. — MrFlick, Jul 03 '15 at 05:25

Rorschach · Accepted Answer · 2015-07-05T15:24:13.003

There are a few ways to deal with the problem. You can use all.equal or just test to see if numbers are nearly the same.

grubbs.flag <- function(x, tol=1e-9) {
    check <- function(a, b) any(abs(a - b) < tol)                    # check for nearly equal
    outliers <- NULL
    test <- x
    grubbs.result <- grubbs.test(test)
    pv <- grubbs.result$p.value
    while(pv < 0.05) {
        outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))
        inds <- sapply(test, check, outliers)                        # replace the %in% test
        test <- test[!inds]
        grubbs.result <- grubbs.test(test)
        pv <- grubbs.result$p.value
    }
    return(data.frame(X=x,Outlier=sapply(x, check, outliers)))       # replace %in% test
}

a=c(-1e6, 1,5,7,9,110, 1000)
b=3
c=a/b
grubbs.flag(c)

#              X Outlier
# 1 -3.333333e+05  TRUE
# 2 3.333333e-01   FALSE
# 3 1.666667e+00   FALSE
# 4 2.333333e+00   FALSE
# 5 3.000000e+00   FALSE
# 6 3.666667e+01    TRUE
# 7 3.333333e+02    TRUE

I found there`s still some bugs in your code. If there are more than one outliers, the 'check' function can not work properly — Xiao Xie, Jul 05 '15 at 13:44
@XiaoXie I'm not sure there is anything wrong with the check function. It is meant to work with a scalar `a` and a vector `b` so you can pass it to `sapply`. The bug I found is that `x` should be `test` in the loop, and then the return value needs to be updated. I'll update and you can try it out. — Rorschach, Jul 05 '15 at 15:22

score 0 · Answer 2 · answered Jul 06 '15 at 03:10

Finally I used all all.equal function to deal with this problem, and it worked perfect for me. Just using stupid looping! ╮(╯◇╰)╭

library(outliers)

# comparing the value of vectors element-wise
match_allequal=function(x,y){
  Logical_i=FALSE
  for(i in 1:length(y)){
    Logical_j=NULL
    for( j in 1:length(x)){
      Logical_j=c(Logical_j,isTRUE(all.equal(x[j],y[i])))
    }
    Logical_i=Logical_j|Logical_i
  }
  return (Logical_i)
}

#flags the outliers
grubbs.flag <- function(x) {
  outliers <- NULL
  test <- x
  grubbs.result <- grubbs.test(test)
  pv <- grubbs.result$p.value
  while(pv < 0.05) {
    outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))
    test <- x[!match_allequal(x,outliers)]
    grubbs.result <- grubbs.test(test)
    pv <- grubbs.result$p.value
  }
  return(data.frame(X=x,Outlier=match_allequal(x,outliers)))
}

R: bugs in flagging outliers (how R recognizes the length of a infinite decimal)

2 Answers2