How would I count the number of outliers in a numerical vector where an outlier is defined as any datapoint that is more than 3 standard deviations away from the mean?
Asked
Active
Viewed 74 times
1 Answers
1
If your vector is called x
you can do
sum(abs(x - mean(x)) > (3 * sd(x)))
This works by finding the absolute distance between each value in x
and the mean of the whole vector ( abs(x - mean(x))
), then testing which of these values is greater than 3 * sd(x)
. The result is a logical vector of TRUE
and FALSE
, and if we sum
it, we get the total number of TRUE
values in the vector.
For example:
set.seed(1) # For reproducibility
x <- rnorm(10000) # Draw 10,000 elements from N(0, 1)
mean(x)
#> [1] -0.006537039
sd(x)
#> [1] 1.012356
# Find whether each element in x is an outlier or not
outliers <- abs(x - mean(x)) > (3 * sd(x))
# Show the outliers
x[outliers]
#> [1] 3.810277 3.055742 -3.213189 3.639574 -3.253220 3.153971 -3.539586
#> [8] 3.064524 -3.208057 -3.202110 -3.071243 -3.671300 -3.119118 -3.232610
#> [15] 3.624361 -3.060042 -3.147046 3.376912 -3.450502 -3.227233 3.093395
#> [22] 3.111203 -3.187454
# Count the outliers
sum(abs(x - mean(x)) > (3 * sd(x)))
#> [1] 23
Created on 2022-06-01 by the reprex package (v2.0.1)

Allan Cameron
- 147,086
- 7
- 49
- 87