1

I want to extract the outliers from my data frame. Like 10 out of 1000 data points which are possible outliers or doesn't fall in 95% confidence interval. There are some ways to find the value with largest difference between it and sample mean.

> a <- c(1,3,2,4,5,2,3,90,78,56,78,23,345)
> require("outliers")
> outlier(a)
[1] 345

I don't want to remove the outliers from my dataframe or from my boxplot. I want to print or subset them.

Any ideas?

Ashvin Meena
  • 309
  • 1
  • 13
  • You may find this question and its answers useful: http://stackoverflow.com/questions/1444306/how-to-use-outlier-tests-in-r-code – r.bot Apr 01 '15 at 11:38
  • @A.Val. If i consider 95% confidence interval, some of them will be consider as an outlier. Let pick this as a criteria, could you suggest some? how to do it? – Ashvin Meena Apr 01 '15 at 11:47
  • Ugh. Ok, I'll try. But from the looks of it - 95% confidence will play interesting trick with your data. – statespace Apr 01 '15 at 12:25
  • this is just a sample. Consider "rnorm(200, mean=10, sd=3)" or something else. – Ashvin Meena Apr 01 '15 at 12:39

1 Answers1

0

Given the data:

a <- c(1,3,2,4,5,2,3,90,78,56,78,23,345)

If you want to get values that are not within 95% confidence. You do have to keep in mind that confidence is concept of probability of "true mean".

In this case:

> mean(a)
[1] 53.07692

First question to answer: is 53 is the "normal" value you would most likely expect? Why do I ask it? Because if you want to print values that are not within 95%:

a[a > mean(a) + qt(0.975, df = length(a) - 1) * mean(a) / sqrt(length(a)) |
    a < mean(a) - qt(0.975, df = length(a) - 1) * mean(a) / sqrt(length(a))]

[1]   1   3   2   4   5   2   3  90 345

You might get much more than you expect, in your case.

statespace
  • 1,644
  • 17
  • 25