I just tested an elastic net with and without a parallel backend. The call is:
enetGrid <- data.frame(.lambda=0,.fraction=c(.005))
ctrl <- trainControl( method="repeatedcv", repeats=5 )
enetTune <- train( x, y, method="enet", tuneGrid=enetGrid, trControl=ctrl, preProc=NULL )
I ran it without a parallel backend registered (and got the warning message from %dopar%
when the train
call was finished), and then again with one registered for 7 cores (of 8). The first run took 529 seconds, the second, 313. But the first took 3.3GB memory max (reported by the Sun cluster system), and the second took 22.9GB. I've got 30GB of ram, and the task only gets more complicated from here.
Questions:
1) Is this a general property of parallel computation? I thought they shared memory....
2) Is there a way around this while still using enet
inside train
? If doParallel
is the problem, are there other architectures that I could use with %dopar%
--no, right?
Because I am interested in whether this is the expected result, this is closely related but not the exact same as this question, but I'd be fine closing this and merging my question in to that one (or marking that as duplicate and pointing to this one, since this has more detail) if that's what the concensus is: