-2

How would I count the number of outliers in a numerical vector where an outlier is defined as any datapoint that is more than 3 standard deviations away from the mean?

lallal
  • 7
  • 1

1 Answers1

1

If your vector is called x you can do

sum(abs(x - mean(x)) > (3 * sd(x)))

This works by finding the absolute distance between each value in x and the mean of the whole vector ( abs(x - mean(x)) ), then testing which of these values is greater than 3 * sd(x). The result is a logical vector of TRUE and FALSE, and if we sum it, we get the total number of TRUE values in the vector.

For example:

set.seed(1) # For reproducibility

x <- rnorm(10000) # Draw 10,000 elements from N(0, 1)

mean(x)
#> [1] -0.006537039

sd(x)
#> [1] 1.012356

# Find whether each element in x is an outlier or not
outliers <- abs(x - mean(x)) > (3 * sd(x))

# Show the outliers
x[outliers]
#>  [1]  3.810277  3.055742 -3.213189  3.639574 -3.253220  3.153971 -3.539586
#>  [8]  3.064524 -3.208057 -3.202110 -3.071243 -3.671300 -3.119118 -3.232610
#> [15]  3.624361 -3.060042 -3.147046  3.376912 -3.450502 -3.227233  3.093395
#> [22]  3.111203 -3.187454

# Count the outliers
sum(abs(x - mean(x)) > (3 * sd(x)))
#> [1] 23

Created on 2022-06-01 by the reprex package (v2.0.1)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87