0

I am trying to set up parallel computing in R for a large simulation, but I noticed that there is no improvement in time.

I tried a simple example:

library(foreach)
library(doParallel)

stime<-system.time(for (i in 1:10000) rnorm(10000))[3]
print(stime)
10.823

cl<-makeCluster(2)
registerDoParallel(cores=2)
stime<-system.time(ls<-foreach(s = 1:10000) %dopar% rnorm(10000))[3]
stopCluster(cl)
print(stime)
29.526

The system time is more then twice as much as it was in the original case without parallel computing.

Obviously I am doing something wrong but I cannot figure out what it is.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
upabove
  • 1,057
  • 3
  • 18
  • 29
  • There's overhead involved with parallel computations. This result is to be expected. – Frank Oct 27 '14 at 16:37
  • you mean it takes time to initiate the clusters? or what do you mean? – upabove Oct 27 '14 at 16:38
  • yes but I'm only measuring the task itself not the time it takes to initialize the clusters. when should I expect to get a greater benefit? – upabove Oct 27 '14 at 16:42
  • Even just the coordination after cluster initialization, e.g. combining everything back into a single array, takes time and overhead. Moreover, depending on how your doing parallel computing, you also incur overhead via context switching, etc. – Livius Oct 27 '14 at 16:55
  • You need to run a `profile` (I think that's the function name) to see how much time is spent setting up the data to pass to each node vs. the time it takes a node to process data. In general (a dangerous way to start a sentence :-) ), parallelism only helps when you send relatively little data but do a ton of processing on said data. – Carl Witthoft Oct 27 '14 at 17:05
  • See also: http://stackoverflow.com/q/7180377/892313 In general, there is overhead with running parallel tasks, and if the tasks are small and quick, that may overwhelm any savings by doing them in parallel. – Brian Diggs Oct 27 '14 at 18:41

1 Answers1

2

Performing many tiny tasks in parallel can be very inefficient. The standard solution is to use chunking:

ls <- foreach(s=1:2) %dopar% {
  for (i in 1:5000) rnorm(10000)
}

Instead of executing 10,000 tiny tasks in parallel, this loop executes two larger tasks, and runs almost twice as fast as the sequential version on my Linux machine.

Also note that your "foreach" example is actually sending a lot of data from the workers to the master. My "foreach" example throws that data away just like your sequential example, so I think it's a better comparison.

If you need to return a large amount of data then a fair comparison would be:

ls <- lapply(rep(10000, 10000), rnorm)

versus:

ls <- foreach(s=1:2, .combine='c') %dopar% {
  lapply(rep(10000, 5000), rnorm)  
}

On my Linux machine the times are 8.6 seconds versus 7.0 seconds. That's not impressive due to the large communication to computation ratio, but it would have been much worse if I hadn't used chunking.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75