0

I want to remove from my dataframe all the observations where at least one variable is beyond 2 standard deviations. I've got 38 variables plus two others columns.

These lines extract the outliers:

std=2
outliers = boxplot(data[3:40], plot=FALSE,range=std)$out

but I can't update my dataframe, I tried a bunch of things like:

data[3:40][!data[3:40] %in% outliers]

Can you help me please?

Papershine
  • 4,995
  • 2
  • 24
  • 48
mobupu
  • 245
  • 3
  • 10
  • I usually use ggplot, but looking at `?boxplot` : maybe try `outline = FALSE` – tjebo Feb 19 '18 at 00:45
  • What's `data`?? Please provide [Reproducible Example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Sotos Feb 19 '18 at 09:53
  • data is a dataset with 40 columns, but I have found a solution to my problem – mobupu Feb 19 '18 at 12:10
  • You should add your solution as an answer and mark it correct, not editing your question to add it in – Papershine Feb 19 '18 at 12:10

2 Answers2

0

@mobupu Tjebo is right. boxplot(x, outline = FALSE) removes outliers. this is a simple reproducible example.

i<-iris$Sepal.Length
i[151]<-25
boxplot(i)
boxplot(i, outline = FALSE)
Michael Vine
  • 335
  • 1
  • 9
  • actually i want to remove the outliers from my dataframe, not from the boxplot. I speak about boxplot because `boxplot(data[3:40], plot=FALSE,range=std)$out` return the list of outliers value, that i want to match in my dataframe to remove the corresponding rows. – mobupu Feb 19 '18 at 09:38
0

You can remove rows where any variable in columns 3:40 is > 2 stdev from the mean with

require(magrittr); require(dplyr)
df %<>%  .[sapply(.[ ,3:40], function(x) x %between% (mean(x) + 2*c(-1, 1)*sd(x))) %>% 
            apply(1, all)
        ,]
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38