0

I am using R parallel package to do parallel computation on my laptop:

> library(parallel)
> x = matrix(rep(1,2000), nrow=2)
> cl <- makeCluster(getOption("cl.cores", 8))
> system.time(replicate(5000, parApply(cl, x, 1, paste, collapse="-")))
   user  system elapsed 
  7.950   0.966  13.562 
> stopCluster(cl)
> system.time(replicate(5000, apply(x, 1, paste, collapse="-")))
   user  system elapsed 
  8.357   0.001   8.355

Did I make any mistake here? The only thing I am not so sure about is how to use makeCluster.

Update: to reduce the overhead cost of parallelization, use a much bigger matrix x and remove replicate in the benchmark; still, the different between parallel and serial are very marginal and sometimes parallel is slower.

RNA
  • 146,987
  • 15
  • 52
  • 70
  • 3
    parallel isn't always faster. It takes time to create a cluster and put it all back together. If the operation is a bunch of very small jobs parallel processing may be slower. – Tyler Rinker Apr 12 '13 at 03:17
  • Remember [Amdahl's law](http://en.wikipedia.org/wiki/Amdahl%27s_law). – Michael Hoffman Apr 12 '13 at 03:33
  • @TylerRinker: as I mentioned in the update. Parallelization here doesn't make difference even for single time-costly job. I just want to make sure what I didn't make mistake here; and if not, what factors could explain it. – RNA Apr 12 '13 at 03:51
  • @RNA: your update _increases_ the cost of running in parallel because you're increasing the amount of data sent to each node. Running on multiple CPUs is only beneficial if the problem is CPU-intensive. – Joshua Ulrich Apr 12 '13 at 11:21

0 Answers0