I need to calculate several statistical parameters for a vector, while omitting each value within it once. Since this happens on a large dataset with many parameters, I'm looking for a general approach to optimize for performance. A simple example would be:
v <- c(9, 14, 8, 12, 5, 10, 6, 9, 9, 9, 9, 10, 8, 11, 9, 9, 10, 6, 10, 10)
sapply(1:length(v), function(x){
var(v[-x])
})
Resulting in the desired result with a vector containing the total variance of v
, if each element is omitted once:
[1] 4.134211 4.134211 4.134211 4.134211 4.134211 4.134211 4.134211 4.134211 4.134211 4.134211 4.134211 4.134211 4.134211 4.134211 4.134211 4.134211
[17] 4.134211 4.134211 4.134211 4.134211
As stated, this results in poor performance when working with larger datasets and multiple parameters. Since loops are sometimes shamed for being slow, I'm looking for efficient alternatives, i.e. vectorized functions.
Thank you!
EDIT: Both proposed solutions boost performance significantly. While Dominiks solution wins the race for speed, Rolands approach is more general and can be used in a wider fashion. Therefore, Rolands answer is marked as correct, while I will use Dominiks solution for this particular situation. Thanks to both!
Results with N = 2000
Unit: milliseconds
expr min lq mean median uq max neval
original approach 117.2269 122.38290 130.933014 124.95565 128.69030 453.0770 100
approach from Roland 57.1625 64.75505 96.255364 67.88550 168.55915 204.6941 100
approach from Dominik 2.7083 2.89440 3.395894 2.99545 3.24165 30.0510 100