51

What's the difference between doParallel and doMC in R concerning foreach function? doParallel supports windows, unix-like, while doMC supports unix-like only. In other words, why doParallel cannot replace doMC directly? Thank you.

Update: doParallel is built on parallel, which is essentially a merger of multicore and snow and automatically uses the appropriate tool for your system. As a result, we can use doParallel to support multi systems. In other words, we can use doParallel to replace doMC.

ref: http://michaeljkoontz.weebly.com/uploads/1/9/9/4/19940979/parallel.pdf

BTW, what is the difference between registerDoParallel(ncores=3) and

cl <- makeCluster(3)
registerDoParallel(cl)

It seems registerDoParallel(ncores=3) can stop cluster automatically, while the second do not stop automatically and needs stopCluster(cl).

ref: http://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf

Zhilong Jia
  • 2,329
  • 1
  • 22
  • 34
  • just so that the two get linked http://stackoverflow.com/questions/28829300/doparallel-cluster-vs-cores?noredirect=1&lq=1 – Tony Jan 30 '17 at 17:12
  • Possible duplicate of [doParallel, cluster vs cores](https://stackoverflow.com/questions/28829300/doparallel-cluster-vs-cores) – Jim G. Dec 20 '17 at 12:01

1 Answers1

32

The doParallel package is a merger of doSNOW and doMC, much as parallel is a merger of snow and multicore. But although doParallel has all the features of doMC, I was told by Rich Calaway of Revolution Analytics that they wanted to keep doMC around because it was more efficient in certain circumstances, even though doMC now uses parallel just like doParallel. I haven't personally run any benchmarks to determine if and when there is a significant difference.

I tend to use doMC on a Linux or Mac OS X computer, doParallel on a Windows computer, and doMPI on a Linux cluster, but doParallel does work on all of those platforms.


As for the different registration methods, if you execute:

registerDoParallel(cores=3)

on a Windows machine, it will create a cluster object implicitly for later use with clusterApplyLB, whereas on Linux and Mac OS X, no cluster object is created or used. The number of cores is simply remembered and used as the value of the mc.cores argument later when calling mclapply.

If you execute:

cl <- makeCluster(3)
registerDoParallel(cl)

then the registered cluster object will be used with clusterApplyLB regardless of the platform. You are correct that in this case, it is your responsibility to shutdown the cluster object since you created it, whereas the implicit cluster object is automatically shutdown.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • 2
    Are there words or documents about "the certain circumstances" for `doMC` from Rich Calaway of Revolution Analytics? In addition, is there any difference considering performance in the `ncores` and `makeCluster` situation? I developed a R [cogena](https://github.com/zhilongjia/cogena) package where parallel was copied with `doMC` originally. I just changed it to `doParallel` to support Windows few hours ago. It's a litter complex concerning `NAMESPACE` and `import` when implemented via mix of `doMC` and `doParallel`. Thank you. – Zhilong Jia Mar 12 '15 at 09:09
  • @Zhilong In an R package, I think you should let the end user register whatever backend works best on their hardware. That makes your code simpler and more flexible. That was the original intent of separating out the backend and is the way that caret and plyr do it, for example. – Steve Weston Mar 12 '15 at 12:27
  • I've gotten your idea. Thank you. If there are performance difference in general machine, I will do as you suggested. If not, I prefer my package is easy to use so far. I emailed the Maintainer of `doParallel` and will keep you update. – Zhilong Jia Mar 12 '15 at 14:18
  • Here is the reply from Rich of Revolution Analytics (sorry, a litter long for comment on SO, I just paste the conclusion): "I recommend doParallel for most new users, because it is supported on both Linux and Windows and uses the built-in parallel package. Further, I recommend using the fork cluster feature of parallel and the snow-like interface to avoid zombie processes on Linux." – Zhilong Jia Mar 13 '15 at 16:04
  • Typo: the option is called `cores` not `ncores`. It wouldn't let me edit your answer with less than 6 characters. – James Hirschorn Oct 04 '16 at 20:46
  • @JamesHirschorn Thanks. I fixed it in my answer. – Steve Weston Oct 04 '16 at 20:50