2

I am trying to use %dopar% to speed up my for loop by parallelizing over multiple cores. However, I am unable to store the values that are returned. Here is a small reproducible example.

Using %dopar%

cl <- parallel::makeForkCluster(4)
doParallel::registerDoParallel(cl)
junk_parallel = seq(0,100000,1)
system.time(foreach(i=seq(0,10000,1))%dopar%{
  junk_parallel[i] = sqrt(i)})
stopCluster(cl)

Output:

user  system elapsed 
  2.536   0.148   2.690 
> junk_parallel[9]
[1] 8

Using %do%

cl <- parallel::makeForkCluster(4)
doParallel::registerDoParallel(cl)
junk_parallel = seq(0,100000,1)
system.time(foreach(i=seq(0,10000,1))%do%{
  junk_parallel[i] = sqrt(i)}) 
stopcluster(cl)

Output:

 user  system elapsed 
  2.172   0.004   2.174 
> junk_parallel[9]
[1] 3 

Why is that %dopar% unable to assign the right value? When to use %dopar% vs %do%?

Thanks in advance,

honeybadger
  • 1,465
  • 1
  • 19
  • 32
  • Obviously you haven't read the introductory vignettes. You should read them: https://cran.r-project.org/web/packages/foreach/vignettes/foreach.pdf and https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf – Roland Aug 21 '19 at 06:51
  • @Roland: I have read it and could not find an answer in the vignette. For the sake of posterity, another detailed answer on how to store values using %dopar% is https://stackoverflow.com/questions/19791609/saving-multiple-outputs-of-foreach-dopar-loop – honeybadger Aug 21 '19 at 06:54
  • 1
    You may have read, but you didn't understand. These vignettes don't show any loops with side effects (like assignment into objects outside the loop). `foreach` is much more similar to `lapply` than to a `for` loop. – Roland Aug 21 '19 at 06:56
  • @Roland: Thanks, I will keep in mind.. – honeybadger Aug 21 '19 at 06:58
  • 3
    You can have a look at https://privefl.github.io/blog/a-guide-to-parallelism-in-r/ to learn more about foreach and common issues with it (including yours). – F. Privé Aug 21 '19 at 07:35

1 Answers1

2

The computation in a parallel loop is in it's own instance. You're trying to assign to a global that foreach does not have access to. Try this:

cl <- parallel::makeForkCluster(4)
doParallel::registerDoParallel(cl)
junk_parallel <- foreach(i=seq(0,10000,1)) %dopar% {
  sqrt(i)}
stopCluster(cl)
thc
  • 9,527
  • 1
  • 24
  • 39