How can I run a for loop in parallel in R

Question

Suppose I have a function f() and a vector d

f <- function(x) dexp(x, 2)
d <- runif(10, 1, 5)

Now I want to perform a for loop like

dnew <- numeric(length(d))
for (i in seq_along(dnew)){
   dnew[i] <- f(d[i])
}

. How can I do this in parallel?

"How to ... Parallel" should likely start with https://cran.r-project.org/web/views/HighPerformanceComputing.html. I think many have had great luck with the `foreach` package, others find `future` to be great, too. — r2evans, Oct 29 '18 at 21:55
foreach package did not help me. It took more time if I used foreach — forstack, Oct 29 '18 at 22:02
Okay. If `foreach` used correctly does not speed it up, then (1) start-up costs do not offset parallel execution; (2) you did it wrong or it could be done better; or (3) something else we don't know. In general, for parallelization to work well, the calculation time has to be significantly more than the start-up cost (fork R if appropriate, load libraries, transfer data, etc). If you include your `foreach` code, perhaps somebody adept at that package can help improve its performance. — r2evans, Oct 29 '18 at 22:11
Possible duplicate of [run a for loop in parallel in R](https://stackoverflow.com/questions/38318139/run-a-for-loop-in-parallel-in-r) — divibisan, Jul 02 '19 at 20:30

score 4 · Answer 1 · answered Oct 29 '18 at 23:13

The example code is faster without for loop:

dnew2 <- f(d)          # 'f()' and 'd' from question
all.equal(dnew, dnew2) # 'dnew' from question 
[1] TRUE

library(microbenchmark)
microbenchmark('for loop' = for (i in seq_along(dnew)){ dnew[i] <- f(d[i]) },
               'vectorized' = { dnew2 = f(d) })
Unit: microseconds
       expr    min      lq     mean  median     uq    max neval
   for loop 15.639 16.4455 17.66640 17.0045 18.089 43.938   100
 vectorized  1.249  1.3140  1.44039  1.3845  1.516  2.424   100

It can be parallelized with foreach:

library(foreach)
library(doParallel); registerDoParallel(2)
dnew3 <- foreach(i=seq_along(dnew), .combine=c) %dopar% {
    f(d[i])
}
all.equal(dnew, dnew3)
[1] TRUE

The paralleized version is slower, because the parallel overhead is larger than the benefit.

microbenchmark('for loop' = for (i in seq_along(dnew)){ dnew[i] <- f(d[i]) },
               'foreach' = { dnew3 <- foreach(i=seq_along(dnew), .combine=c) %dopar% {
                                      f(d[i]) } 
                            })
Unit: microseconds
     expr       min        lq        mean     median         uq       max neval
 for loop    17.799    22.048    31.01027    32.7615    37.0945    67.265   100
  foreach 11875.845 13003.558 13576.64759 13427.1015 14041.3455 17782.638   100

If f() takes more time to be evaluated, the foreach version is faster:

f <- function(x){
    Sys.sleep(.3)
    dexp(x, 2)
}
microbenchmark('for loop' = for (i in seq_along(dnew)){ dnew[i] <- f(d[i]) },
               'foreach' = {dnew3 <- foreach(i=seq_along(dnew), .combine=c) %dopar% {
                                     f(d[i]) }
                }, times=2)
Unit: seconds
     expr      min       lq     mean   median       uq      max neval
 for loop 3.004271 3.004271 3.004554 3.004554 3.004837 3.004837     2
  foreach 1.515458 1.515458 1.515602 1.515602 1.515746 1.515746     2

You should combine foreach and your vectorized function (separate in ncores blocks). — F. Privé, Oct 30 '18 at 06:32

paoloeusebi · Answer 2 · 2018-10-29T22:31:49.780

1

Simple for loop

a <- function(x) {dexp(x,2)}
d<- runif(10,1,5)
d
dnew < - numeric(length(d))
for (i in 1: length(dnew)){
dnew[i]<- a(d[i])
}
dnew

Parallel version

library(doParallel)
dnew < - numeric(length(d))
no_cores <- detectCores() - 1  
registerDoParallel(cores=no_cores)  
cl <- makeCluster(no_cores, type="FORK")  
dnew <- clusterApply(cl=cl, x=d, fun = a)  
stopCluster(cl) 
dnew

Take a look to this blog's post: https://www.r-bloggers.com/lets-be-faster-and-more-parallel-in-r-with-doparallel-package/

Hope it helps!

edited Oct 29 '18 at 22:31

answered Oct 29 '18 at 22:02

paoloeusebi

1,056
8
19

1

This seems more like a comment than an answer. As an answer, it should contain at least an adaptation for the OP's data/code. Links-only tend to go stale when the link dies (although admittedly r-bloggers is fairly good at longevity so far). – r2evans Oct 29 '18 at 22:08

How can I run a for loop in parallel in R

2 Answers2