0

I have 5 variables with 1000 observations. So the 5 variables contain lot of outliers like 10,11, 13, 1003, 10987, 1099, and also it contain missing values. So I want to remove multiple outliers.

M--
  • 25,431
  • 8
  • 61
  • 93
Karthi CK
  • 11
  • 1
  • 1
  • 1
    Please help us help you by providing us with a reproducible example (i.e. code and example data), see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for details. – Paul Hiemstra May 13 '13 at 05:26
  • 2
    You could start with one of the methods in: `install.packages("outliers")`; `library(outliers)`. Also consider use of robust methods rather than using outlier rejection before moving forward. If this is a general question about outliers you might find a more receptive audience on [CrossValidated](http://stats.stackexchange.com/). Giving some clues to the problem you're facing (and its scale) may make certain methods more attractive. Removing `NA`s may be a separate question, which you should already be able to find an answer to on this site. – dardisco May 13 '13 at 06:14
  • possible duplicate of [How to remove outliers from a dataset](http://stackoverflow.com/questions/4787332/how-to-remove-outliers-from-a-dataset) – Fluffeh Apr 29 '14 at 09:18

2 Answers2

3

You could create a condition to extract relevant data and exclude outliers. For example if your dataframe is called "df1" and you want to extract data in a certain column (e.g: column "2") with values between 1 and 5:

condition1 <- df1[,2] >=1 & df1[,2] <=5
df1 <- df1[condition1,]

I hope this helps

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
MB123
  • 501
  • 2
  • 6
  • 12
0

Something that is less dependent on the specific values uses quantiles.

df <- data.frame(a = c(rep(1, 5), c(5, 7)), b = 1:7)
keep <- sapply(names(df), function(f) (df[,f] <= quantile(df[,f], probs = c(0.9))) )
df[apply(keep, 1, all),]
  a b
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 5 6
gerti
  • 1