0

I'm trying to figure out how to remove a whole row when I find an outlier, outside a given limit, in a column of the same matrix. So I got a data set with labeled columns(B,C,D etc) from where I want to remove outliers that's greater than 3 standard deviations. When an outlier is found the whole row is to be removed. When done with one column the same procedure is to be repeated for the next one.

I found this post: Removing matrix rows if values of a cloumn are outliers but the code there removes all outliers outside 1.5 standard deviations, not outside your own limit, right?

(I'm sorry if this is a basic question, I'm relatively new to R. I've only been coding with MatLab before.)

Community
  • 1
  • 1
  • In the link you are referring to, it doesn't remove 1.5sd, but 1.5 * Interquartile Range. Also, you might want to read [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Sotos Feb 25 '16 at 10:55

1 Answers1

0

In this case you have to define your own function to identify the outliers. Try the following:

remove_outliers2 <- function(x, limit = 3) {
    mn <- mean(x, na.rm = T)
    out <- limit * sd(x, na.rm = T)
    x < (mn - out) | x > (mn + out)
}

This function will return a TRUE or FALSE vector that has the same dimensions as x. It will return TRUE when the element is an outlier.

To apply this function to all columns do:

apply(x,2,remove_outliers2,lim = 2)

And then proceed to remove those rows that contain a TRUE.

R. Schifini
  • 9,085
  • 2
  • 26
  • 32