0

I have problem by using foreach package in R. In fact, when I compile this code :

tmp=proc.time()
x<-for(i in 1:1000){sqrt(i)} 
x
proc.time()-tmp

and this code :

tmp=proc.time()
x<- foreach(i=1:1000) %dopar% sqrt(i)
x
proc.time()-tmp

The R console posts for Parallel Computing :

utilisateur     système      écoulé 
      0.464       0.776       0.705  

and for the normal loop :

utilisateur     système      écoulé 
      0.001       0.000       0.001 

So the normal loop runs faster... Is it normal?

Thanks for your help.

Andrie
  • 176,377
  • 47
  • 447
  • 496
Samy Jelassi
  • 165
  • 7
  • 1
    There are several problems with your comparison. 1) In the "normal" code, you don't create an object. 2) Your parallel code will not run in parallel unless you configure a parallel backend. You do not show this step, so I am assuming your parallel code runs in serial. – Andrie Jun 09 '15 at 16:36
  • Thank you for your answer ! How can I configure a parallel backend? – Samy Jelassi Jun 09 '15 at 16:39

2 Answers2

1

Parallel processing won't speed up simple operations like sqrt(x). Ideally you use it for more complex operations, or you do something like,

x<- foreach(i=0:9,combine = 'c') %dopar% sqrt(seq(i*10000000,(i+1)*10000000-1))
x

It takes more time to switch processes than it does to those tasks. If you look at the processors used in your system monitor/task manager, you'll see that only one processor is used, regardless of the backend you set up.

Edit: It seems that you have no parallel backend set up for your foreach loop, so it will default to sequential mode anyway. An easy way to set up the parallel backend is

library(doParallel)
ncores = detectCores()
clust = makeCluster(ncores - 2)
registerDoParallel(clust)
#do stuff here
#...
stopCluster(clust)

Depending on your system, you may need to do more outside of R in order to get the backend set up.

Max Candocia
  • 4,294
  • 35
  • 58
  • Thank you for your answer ! I tried to set up the parallel backend as you told me but the running time is still upper than for the "normal" code. The R console posts for Parallel Computing : utilisateur système écoulé 0.471 0.032 0.585 It is true that it is better than before ! – Samy Jelassi Jun 09 '15 at 16:48
  • If you check the first part of my answer, you'll see that you will not gain efficiency for running in parallel when your operation is `sqrt(x)`. It takes time for the computer to switch from calculating different parts of the loop. The example you gave is somewhat trivial (as it would make the most sense to just to `sqrt(1:1000)`), but in general you want to reserve parallel computing for long tasks that have a reasonable number of iterations. – Max Candocia Jun 09 '15 at 16:56
  • @SamyJelassi Maybe this is normal. That means that your job is not "enough slow" to take any profit from parallelization. – agstudy Jun 09 '15 at 16:58
  • Yes you are right ! Thank you for your help ! I found how make parallelization more efficient ! – Samy Jelassi Jun 09 '15 at 16:59
1

Here is some test code you can use to set up a parallel experiment on Windows:

library(foreach)
library(doParallel)

cl <- makePSOCKcluster(2)
registerDoParallel(cl)

system.time({
  x <- foreach(i=1:1000) %do% Sys.sleep(0.001)
})

system.time({
  x <- foreach(i=1:1000) %dopar% Sys.sleep(0.001)
})

stopCluster(cl)

You should find that the parallel implementation runs roughly half the time as serial:

> system.time({
+   x <- foreach(i=1:1000) %do% Sys.sleep(0.001)
+ })
   user  system elapsed 
   0.08    0.00   12.55 
> 
> system.time({
+   x <- foreach(i=1:1000) %dopar% Sys.sleep(0.001)
+ })
   user  system elapsed 
   0.23    0.00    6.09 

Note that parallel computing is not a silver bullet. There is a fixed startup cost as well as a communication cost. See Amdahl's law

In general, it is only worth doing parallel computing if your task is taking a long time to run.

Andrie
  • 176,377
  • 47
  • 447
  • 496