How to parallelize a for loop that is looping over a vector in R

Question

set.seed(3)
myvec <- rnorm(1000)

output <- vector("list", length = length(myvec))
for(i in 1:length(myvec)){
   output[[i]] <- floor(myvec[i])^2 + exp(myvec[i])^2/2
}

Suppose I have a pre-specified vector of numbers called myvec. I would like to loop over each element, and the final output is a list.

Using for loop can be very inefficient. Similarly, using lapply is also quite slow.

output <- lapply(1:length(myvec), function(i){
floor(myvec[i])^2 + exp(myvec[i])^2/2
})

Is there an alternative that's much faster? The function that I made up above is a toy function. In reality, the function I'm running is much more complicated than just floor(myvec[i])^2 + exp(myvec[i])^2/2, so I'm looking for alternatives to for loop and lapply.

What have you tried? Suggested duplicates: [How to run a for loop in parallel in R](https://stackoverflow.com/a/38335697/903061), [How do I parallelize R on windows?](https://stackoverflow.com/q/23926334/903061). Have you looked at any of the options in the [CRAN Task View on High Performance Computing](https://cran.r-project.org/web/views/HighPerformanceComputing.html)? `foreach` is a popular package, and there are many other options mentioned... `snowfall`, `futures`, `parallel`... — Gregor Thomas, Aug 25 '22 at 01:46
I've looked into `foreach`, but that's parallelizing a function, whereas I'm trying to pass each element of a vector into a function. — Adrian, Aug 25 '22 at 01:47
I don't really understand what you mean by *"that's parallelizing a function, whereas I'm trying to pass each element of a vector into a function"*. I've added a `foreach` answer with your example. — Gregor Thomas, Aug 25 '22 at 02:09
"Is there an alternative that's much faster?" Yes, write a vectorized function. If that is not possible or too difficult in R, implement it with Rcpp. That will improve performance by orders of magnitude. The performance gains of parallelization are limited by the number of CPUs and by parallelization overhead. It's only worth doing if re-implementing your function would cost too much developer time (which is often the case for functions that fit a model). — Roland, Aug 25 '22 at 05:25

score 2 · Answer 1 · answered Aug 25 '22 at 01:48

2

Several different ways to accomplish this but my go-to is purrr. The purrr implementation would be as follows:

output <- map(my_vec, function(x) {
  floor(x)^2 + exp(x)^2/2
})

There's several different ways you could rewrite the above code including using anonymous functions or using map_dbl to return a vector of numeric types as opposed to a list with the results, but the above is the most basic + explicit version.

The beauty of purrr is that you can also parallelize it very easily with furrr. The same chunk could be easily parallelized as folllows:

library(furrr)
plan(multiprocess)

output <- future_map(my_vec, function(x) {
  floor(x)^2 + exp(x)^2/2
})

answered Aug 25 '22 at 01:48

geoff

942
5
13

Is there a way to know how far along `future_map` is? Similar to a `print(i)` in a for loop to keep track of the iteration. – Adrian Aug 25 '22 at 01:57
Since `purrr` is sequential any classic progress approach should work--I usually use https://cli.r-lib.org/articles/progress.html . `furrr` is a little more complicated since the workload is split, but their docs include a guide on a `furrr` specific progress bar implementation: https://furrr.futureverse.org/articles/progress.html – geoff Aug 25 '22 at 02:00
Regarding progress reports when using the futureverse (here **furrr**): see the **progressr** package, e.g. https://progressr.futureverse.org/#future_map---parallel-purrrmap – HenrikB Aug 25 '22 at 21:03
Please replace `multiprocess` with `multisession`. The formar has been deprecated for a long time (since 2020) and has now been fully removed from the **future** package (July 2023). – HenrikB Jul 02 '23 at 07:31

score 1 · Accepted Answer · answered Aug 25 '22 at 02:07

1

Here's a foreach example:

library(foreach)
library(doParallel)

registerDoParallel(cores = 6)
output <- foreach(x = myvec) %dopar% {floor(x)^2 + exp(x)^2/2}

answered Aug 25 '22 at 02:07

Gregor Thomas

136,190
20
167
294

score 1 · Answer 3 · answered Aug 25 '22 at 21:08

A one-to-one parallel version of

output <- lapply(1:length(myvec), function(i){
  floor(myvec[i])^2 + exp(myvec[i])^2/2
})

is available in future.apply;

library(future.apply)
plan(multisession)

output <- future_lapply(1:length(myvec), function(i){
  floor(myvec[i])^2 + exp(myvec[i])^2/2
})

See https://www.futureverse.org/ for more details and alternatives.

score 0 · Answer 4 · answered Aug 26 '22 at 03:26

Original for loop vs vector

The original code:

set.seed(3)
myvec <- rnorm(100000) #add a two more zeros to make the test more interesting
output <- vector("list", length = length(myvec))
for(i in 1:length(myvec)){
   output[[i]] <- floor(myvec[i])^2 + exp(myvec[i])^2/2
}

Can be vectorized as:

output2 <- floor(myvec)^2 + exp(myvec)^2/2

all.equal((unlist(output)), (output2))

Original lapply vs simplified lapply

lapply(1:length(myvec), function(i){floor(myvec[i])^2 + exp(myvec[i])^2/2})

Can be rewritten:

lapply(myvec, function(i){floor(i)^2 + exp(i)^2/2})

RCPP test

References https://adv-r.hadley.nz/rcpp.html Should I prefer Rcpp::NumericVector over std::vector?

library(Rcpp)
cppFunction('NumericVector mviking(NumericVector x) {
  int n = x.size();
  NumericVector total(x.length());
  for(int i = 0; i < n; ++i) {
    total[i] = pow(floor(x[i]), 2) + pow(exp(x[i]), 2) / 2;
  }
  return total;
}')

output3<-mviking(myvec)
all.equal((unlist(output)), (output3))

Parallel processing

Tested, however need to re-test the parallel processing methods ( parallel::mclapply, foreach::foreach, furrr::future_map, future_lapply )

Results

microbenchmark::microbenchmark(
  original     = for(i in 1:length(myvec)){output[[i]] <- floor(myvec[i])^2 + exp(myvec[i])^2/2},
  basevector   = floor(myvec)^2 + exp(myvec)^2/2,
  lapplymethod = lapply(myvec, function(i){floor(i)^2 + exp(i)^2/2}),
  RCppmethod   = mviking(myvec)
)
Unit: microseconds
         expr       min        lq        mean      median          uq        max neval
         orig 22853.800 24916.587  30708.0438  27669.4520  30675.7515 131391.135   100
   basevector  1223.062  1301.753   1379.0040   1345.2285   1392.9695   2128.601   100
 lapplymethod 63393.969 70413.218 106731.5857 104866.5480 124296.3605 570943.676   100
   RCppmethod   790.102   835.916    901.7346    870.3585    900.8195   1735.371   100