I'm curious about the method of the function Boxplot()
from the car
package to return identified outliers (see for example
How to show the id of outliers on a boxplot).
In fact I supposed that the detected outliers should be the same than any method, but it appeared not to be so, particularly for long vectors. It appears that this function returns only the most extreme outliers for some reason.
Here the demonstration using simulated data (simulation method from : simulation of normal distribution data contaiminated with outliers)
my.rnorm <- function(N, num.out, mean=0, sd=1){
x <- rnorm(N, mean = mean, sd = sd)
ind <- sample(1:N, num.out, replace=FALSE )
x[ind] <- (abs(x[ind]) + 3*sd) * sign(x[ind])
x
}
vector<-my.rnorm(1200,20)
First using the boxplot()
function give me 32 outliers :
outliers1<-sort(boxplot(vector)$out)
sort(outliers1)
1 -4.124101 -3.869423 -3.768973 -3.768571 -3.639510 -3.536848 -3.469979 -3.422215 -3.240268 -3.141479 -3.107837
[12] -2.822105 -2.723802 2.685210 2.712847 2.726344 2.726544 2.751796 2.762394 3.008180 3.030209 3.116131
[23] 3.146028 3.198794 3.353337 3.423981 3.605032 3.607052 3.944753 3.950593 4.012654 4.623255
Now the car::Boxplot()
function gives me the 20 most extreme values :
id_outliers<-car::Boxplot(vector)
outliers2<-vector[id_outliers]
sort(outliers2)
1 -4.124101 -3.869423 -3.768973 -3.768571 -3.639510 -3.536848 -3.469979 -3.422215 -3.240268 -3.141479 3.146028
[12] 3.198794 3.353337 3.423981 3.605032 3.607052 3.944753 3.950593 4.012654 4.623255
Its seems that car::Boxplot()
does not retain the 12 less extremes outliers. The problem is clearer when comparing the two boxplots :
My question is why car::Boxplot
function does not return all outliers ?