1

Suppose I have a function f() and a vector d

f <- function(x) dexp(x, 2)
d <- runif(10, 1, 5)

Now I want to perform a for loop like

dnew <- numeric(length(d))
for (i in seq_along(dnew)){
   dnew[i] <- f(d[i])
}

. How can I do this in parallel?

forstack
  • 31
  • 2
  • 7
  • "How to ... Parallel" should likely start with https://cran.r-project.org/web/views/HighPerformanceComputing.html. I think many have had great luck with the `foreach` package, others find `future` to be great, too. – r2evans Oct 29 '18 at 21:55
  • foreach package did not help me. It took more time if I used foreach – forstack Oct 29 '18 at 22:02
  • 1
    Okay. If `foreach` used correctly does not speed it up, then (1) start-up costs do not offset parallel execution; (2) you did it wrong or it could be done better; or (3) something else we don't know. In general, for parallelization to work well, the calculation time has to be significantly more than the start-up cost (fork R if appropriate, load libraries, transfer data, etc). If you include your `foreach` code, perhaps somebody adept at that package can help improve its performance. – r2evans Oct 29 '18 at 22:11
  • Possible duplicate of [run a for loop in parallel in R](https://stackoverflow.com/questions/38318139/run-a-for-loop-in-parallel-in-r) – divibisan Jul 02 '19 at 20:30

2 Answers2

4
  • The example code is faster without for loop:

    dnew2 <- f(d)          # 'f()' and 'd' from question
    all.equal(dnew, dnew2) # 'dnew' from question 
    [1] TRUE
    
    library(microbenchmark)
    microbenchmark('for loop' = for (i in seq_along(dnew)){ dnew[i] <- f(d[i]) },
                   'vectorized' = { dnew2 = f(d) })
    Unit: microseconds
           expr    min      lq     mean  median     uq    max neval
       for loop 15.639 16.4455 17.66640 17.0045 18.089 43.938   100
     vectorized  1.249  1.3140  1.44039  1.3845  1.516  2.424   100
    
  • It can be parallelized with foreach:

    library(foreach)
    library(doParallel); registerDoParallel(2)
    dnew3 <- foreach(i=seq_along(dnew), .combine=c) %dopar% {
        f(d[i])
    }
    all.equal(dnew, dnew3)
    [1] TRUE
    

    The paralleized version is slower, because the parallel overhead is larger than the benefit.

    microbenchmark('for loop' = for (i in seq_along(dnew)){ dnew[i] <- f(d[i]) },
                   'foreach' = { dnew3 <- foreach(i=seq_along(dnew), .combine=c) %dopar% {
                                          f(d[i]) } 
                                })
    Unit: microseconds
         expr       min        lq        mean     median         uq       max neval
     for loop    17.799    22.048    31.01027    32.7615    37.0945    67.265   100
      foreach 11875.845 13003.558 13576.64759 13427.1015 14041.3455 17782.638   100
    
  • If f() takes more time to be evaluated, the foreach version is faster:

    f <- function(x){
        Sys.sleep(.3)
        dexp(x, 2)
    }
    microbenchmark('for loop' = for (i in seq_along(dnew)){ dnew[i] <- f(d[i]) },
                   'foreach' = {dnew3 <- foreach(i=seq_along(dnew), .combine=c) %dopar% {
                                         f(d[i]) }
                    }, times=2)
    Unit: seconds
         expr      min       lq     mean   median       uq      max neval
     for loop 3.004271 3.004271 3.004554 3.004554 3.004837 3.004837     2
      foreach 1.515458 1.515458 1.515602 1.515602 1.515746 1.515746     2
    
Nairolf
  • 2,418
  • 20
  • 34
1

Simple for loop

a <- function(x) {dexp(x,2)}
d<- runif(10,1,5)
d
dnew < - numeric(length(d))
for (i in 1: length(dnew)){
dnew[i]<- a(d[i])
}
dnew

Parallel version

library(doParallel)
dnew < - numeric(length(d))
no_cores <- detectCores() - 1  
registerDoParallel(cores=no_cores)  
cl <- makeCluster(no_cores, type="FORK")  
dnew <- clusterApply(cl=cl, x=d, fun = a)  
stopCluster(cl) 
dnew 

Take a look to this blog's post: https://www.r-bloggers.com/lets-be-faster-and-more-parallel-in-r-with-doparallel-package/

Hope it helps!

paoloeusebi
  • 1,056
  • 8
  • 19
  • 1
    This seems more like a comment than an answer. As an answer, it should contain at least an adaptation for the OP's data/code. Links-only tend to go stale when the link dies (although admittedly r-bloggers is fairly good at longevity so far). – r2evans Oct 29 '18 at 22:08