Does multicore computing using R's doParallel package use more memory?

Question

I just tested an elastic net with and without a parallel backend. The call is:

enetGrid <- data.frame(.lambda=0,.fraction=c(.005))
ctrl <- trainControl( method="repeatedcv", repeats=5 )
enetTune <- train( x, y, method="enet", tuneGrid=enetGrid, trControl=ctrl, preProc=NULL )

I ran it without a parallel backend registered (and got the warning message from %dopar% when the train call was finished), and then again with one registered for 7 cores (of 8). The first run took 529 seconds, the second, 313. But the first took 3.3GB memory max (reported by the Sun cluster system), and the second took 22.9GB. I've got 30GB of ram, and the task only gets more complicated from here.

Questions: 1) Is this a general property of parallel computation? I thought they shared memory.... 2) Is there a way around this while still using enet inside train? If doParallel is the problem, are there other architectures that I could use with %dopar%--no, right?

Because I am interested in whether this is the expected result, this is closely related but not the exact same as this question, but I'd be fine closing this and merging my question in to that one (or marking that as duplicate and pointing to this one, since this has more detail) if that's what the concensus is:

Extremely high memory consumption of new doParallel package

Since you mention "the Sun cluster system", are you running on a cluster? If so, are you (or can you) use multiple nodes? The big advantage of clusters is that your memory size increases along with the number of CPUs, which can allow problems to scale, unlike running on a single multicore machine. — Steve Weston, Oct 06 '13 at 14:53
The node I'm on has 30GB allocated but 256GB physical RAM, so I can ask for more memory. But the problem has the potential to get much bigger, and I'd like it to scale reasonably. — Ari B. Friedman, Oct 07 '13 at 00:48
You could try to use library(Rdsm), which uses shared memory (but is not supported by caret as far as I know), or perhaps rredis with the foreach and doRedis packages to avoid having to copy your data across all your threads, which can be used with caret, http://stackoverflow.com/questions/16615498/error-occurring-in-caret-when-running-on-a-cluster — Tom Wenseleers, Aug 25 '15 at 19:33

Steve Weston · Accepted Answer · 2013-10-07T16:33:17.507

8

In multithreaded programs, threads share lots of memory. It's primarily the stack that isn't shared between threads. But, to quote Dirk Eddelbuettel, "R is, and will remain, single-threaded", so R parallel packages use processes rather than threads, and so there is much less opportunity to share memory.

However, memory is shared between the processes that are forked by mclapply (as long as the processes don't modify it, which triggers a copy of the memory region in the operating system). That is one reason that the memory footprint can be smaller when using the "multicore" API versus the "snow" API with parallel/doParallel.

In other words, using:

registerDoParallel(7)

may be much more memory efficient than using:

cl <- makeCluster(7)
registerDoParallel(cl)

since the former will cause %dopar% to use mclapply on Linux and Mac OS X, while the latter uses clusterApplyLB.

However, the "snow" API allows you to use multiple machines, and that means that your memory size increases with the number of CPUs. This is a great advantage since it can allow programs to scale. Some programs even get super-linear speedup when running in parallel on a cluster since they have access to more memory.

So to answer your second question, I'd say to use the "multicore" API with doParallel if you only have a single machine and are using Linux or Mac OS X, but use the "snow" API with multiple machines if you're using a cluster. I don't think there is any way to use shared memory packages such as Rdsm with the caret package.

edited Oct 07 '13 at 16:33

answered Oct 07 '13 at 15:42

Steve Weston

19,197
4
59
75

And Steve, what would you suggest under the constraints of Windows 64-bit? I'm running into this problem of memory consumption as I split the task out to multiple processes, each one requiring a copy of a rather large `data.table`. – Matt Weller Dec 27 '13 at 23:03
1

@MattWeller There aren't many choices, since I don't think data.table supports the use of memory mapping. The bigmemory package does, but that may be difficult to use on Windows since there isn't a binary distribution available on CRAN. You can try running in parallel on multiple machines, but that's difficult on Windows since you need to install an MPI distribution such as DeinoMPI or MPICH. – Steve Weston Dec 28 '13 at 19:44
I managed to get things working ok using your suggestion on another thread regarding `iterators`. This means the `data.table` gets split and passed to the worker processes. Thanks for the other tips. – Matt Weller Dec 28 '13 at 20:20
It appears that CRAN does not have a Windows binary available, but for me under R 3.1.1, this works to install bigmemory on Windows: install.packages(c("BH","biglm")) install.packages("bigmemory", repos="http://R-Forge.R-project.org") install.packages("bigmemory.sri", repos="http://R-Forge.R-project.org") install.packages("biganalytics", repos="http://R-Forge.R-project.org") install.packages("bigalgebra", repos="http://R-Forge.R-project.org") library(bigmemory) – Tom Wenseleers Aug 25 '15 at 16:25
Or well this is a more recent version which also works on Windows: install.packages(c("BH","biglm")); library(devtools); devtools::install_github('kaneplusplus/bigmemory'); library(bigmemory); devtools::install_github('tomwenseleers/Rdsm'); library(Rdsm) – Tom Wenseleers Aug 25 '15 at 19:28
(library(Rdsm) can parallelize using shared memory) – Tom Wenseleers Aug 25 '15 at 19:29

IRTFM · Answer 2 · 2013-10-06T16:45:40.010

There is a minimum number of characters elsewise I would simply have typed: 1) Yes. 2) No, er, maybe. There are packages that use a "shared memory" model for parallel computation, but R's more thoroughly tested packages don't use it.

http://www.stat.berkeley.edu/scf/paciorek-parallelWorkshop.pdf

http://heather.cs.ucdavis.edu/~matloff/158/PLN/ParProcBook.pdf

http://heather.cs.ucdavis.edu/Rdsm/BARUGSlides.pdf

Does multicore computing using R's doParallel package use more memory?

2 Answers2

Linked