Original for loop vs vector
The original code:
set.seed(3)
myvec <- rnorm(100000) #add a two more zeros to make the test more interesting
output <- vector("list", length = length(myvec))
for(i in 1:length(myvec)){
output[[i]] <- floor(myvec[i])^2 + exp(myvec[i])^2/2
}
Can be vectorized as:
output2 <- floor(myvec)^2 + exp(myvec)^2/2
all.equal((unlist(output)), (output2))
Original lapply vs simplified lapply
lapply(1:length(myvec), function(i){floor(myvec[i])^2 + exp(myvec[i])^2/2})
Can be rewritten:
lapply(myvec, function(i){floor(i)^2 + exp(i)^2/2})
RCPP test
References https://adv-r.hadley.nz/rcpp.html Should I prefer Rcpp::NumericVector over std::vector?
library(Rcpp)
cppFunction('NumericVector mviking(NumericVector x) {
int n = x.size();
NumericVector total(x.length());
for(int i = 0; i < n; ++i) {
total[i] = pow(floor(x[i]), 2) + pow(exp(x[i]), 2) / 2;
}
return total;
}')
output3<-mviking(myvec)
all.equal((unlist(output)), (output3))
Parallel processing
Tested, however need to re-test the parallel processing methods ( parallel::mclapply, foreach::foreach, furrr::future_map, future_lapply )
Results
microbenchmark::microbenchmark(
original = for(i in 1:length(myvec)){output[[i]] <- floor(myvec[i])^2 + exp(myvec[i])^2/2},
basevector = floor(myvec)^2 + exp(myvec)^2/2,
lapplymethod = lapply(myvec, function(i){floor(i)^2 + exp(i)^2/2}),
RCppmethod = mviking(myvec)
)
Unit: microseconds
expr min lq mean median uq max neval
orig 22853.800 24916.587 30708.0438 27669.4520 30675.7515 131391.135 100
basevector 1223.062 1301.753 1379.0040 1345.2285 1392.9695 2128.601 100
lapplymethod 63393.969 70413.218 106731.5857 104866.5480 124296.3605 570943.676 100
RCppmethod 790.102 835.916 901.7346 870.3585 900.8195 1735.371 100