How to remove multiple outliers from a data.frame

Question

I have 5 variables with 1000 observations. So the 5 variables contain lot of outliers like 10,11, 13, 1003, 10987, 1099, and also it contain missing values. So I want to remove multiple outliers.

Please help us help you by providing us with a reproducible example (i.e. code and example data), see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for details. — Paul Hiemstra, May 13 '13 at 05:26
You could start with one of the methods in: `install.packages("outliers")`; `library(outliers)`. Also consider use of robust methods rather than using outlier rejection before moving forward. If this is a general question about outliers you might find a more receptive audience on [CrossValidated](http://stats.stackexchange.com/). Giving some clues to the problem you're facing (and its scale) may make certain methods more attractive. Removing `NA`s may be a separate question, which you should already be able to find an answer to on this site. — dardisco, May 13 '13 at 06:14
possible duplicate of [How to remove outliers from a dataset](http://stackoverflow.com/questions/4787332/how-to-remove-outliers-from-a-dataset) — Fluffeh, Apr 29 '14 at 09:18

score 3 · Answer 1 · edited Apr 29 '14 at 07:56

3

You could create a condition to extract relevant data and exclude outliers. For example if your dataframe is called "df1" and you want to extract data in a certain column (e.g: column "2") with values between 1 and 5:

condition1 <- df1[,2] >=1 & df1[,2] <=5
df1 <- df1[condition1,]

I hope this helps

edited Apr 29 '14 at 07:56

David Arenburg

91,361
17
137
196

answered May 13 '13 at 08:50

MB123

501
2
6
12

score 0 · Answer 2 · answered Mar 21 '15 at 03:44

Something that is less dependent on the specific values uses quantiles.

df <- data.frame(a = c(rep(1, 5), c(5, 7)), b = 1:7)
keep <- sapply(names(df), function(f) (df[,f] <= quantile(df[,f], probs = c(0.9))) )
df[apply(keep, 1, all),]
  a b
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 5 6

How to remove multiple outliers from a data.frame

2 Answers2