foreach %dopar% slower than for loop

Question

Why foreach() with %dopar% slower than for. Some litle exmaple:

library(parallel)
library(foreach)
library(doParallel)
registerDoParallel(cores = detectCores())

I <- 10^3L

for.loop <- function(I) {
  out <- double(I)
  for (i in seq_len(I))
    out[i] <- sqrt(i)
  out
}

foreach.do <- function(I) {
  out <- foreach(i = seq_len(I), .combine=c) %do%
    sqrt(i)
  out
}

foreach.dopar <- function(I) {
  out <- foreach(i = seq_len(I), .combine=c) %dopar%
    sqrt(i)
  out
}

identical(for.loop(I), foreach.do(I), foreach.dopar(I))
## [1] TRUE
library(rbenchmark)
benchmark(for.loop(I), foreach.do(I), foreach.dopar(I))
##               test replications elapsed relative user.self sys.self user.child sys.child
## 1      for.loop(I)          100   0.696    1.000     0.690    0.000        0.0     0.000
## 2    foreach.do(I)          100 121.096  173.989   119.463    0.056        0.0     0.000
## 3 foreach.dopar(I)          100 120.297  172.841   111.214    6.400        3.5     6.734

Some addition info:

sessionInfo()
## R version 3.0.0 (2013-04-03)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=ru_RU.UTF-8       LC_NUMERIC=C               LC_TIME=ru_RU.UTF-8       
##  [4] LC_COLLATE=ru_RU.UTF-8     LC_MONETARY=ru_RU.UTF-8    LC_MESSAGES=ru_RU.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=ru_RU.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] doMC_1.3.0       rbenchmark_1.0.0 doParallel_1.0.1 iterators_1.0.6  foreach_1.4.0    plyr_1.8        
## 
## loaded via a namespace (and not attached):
## [1] codetools_0.2-8 compiler_3.0.0  tools_3.0.0

getDoParWorkers()
## [1] 4

For small tasks, the overhead of setting up threads will dominate, especially compared to a vectorised function on a single thread. In your implementations above there will be a lot of function call and memory overhead. Parallel processing works best with intensive CPU bound activities. — James, Jun 06 '13 at 13:53

score 6 · Accepted Answer · answered Jun 06 '13 at 13:57

It is specifically mentioned and illustrated with examples that indeed sometimes it's slower to set this up, because of having to combine the results from the separate parallel processes in the package doParallel.

Reference: http://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf

Page 3:

With small tasks, the overhead of scheduling the task and returning the result can be greater than the time to execute the task itself, resulting in poor performance.

I used the example to find out that in some case, using the package resulted in 50% the time needed to execute the code.

foreach %dopar% slower than for loop

1 Answers1

Linked

Related