22

I've been searching around for a simple working example of using ddply() in parallel. I've installed the "foreach" package, but when I call ddply( .parallel = TRUE) I get a warning that "No parallel backend registered")

Can someone provide a simple working example of using ddply in parallel?

Suraj
  • 35,905
  • 47
  • 139
  • 250

4 Answers4

18

Here's a simple working example:

> df <- data.frame(val=1:10, ind=c(rep(2, 5), rep(3, 5)))
> library(doSNOW)
> registerDoSNOW(makeCluster(2, type = "SOCK"))
> system.time(print(ddply(df, .(ind), function(x) { Sys.sleep(2); sum(x) }, .parallel=FALSE)))
  ind V1
1   2 25
2   3 55
   user  system elapsed 
   0.00    0.00    4.01 
> system.time(print(ddply(df, .(ind), function(x) { Sys.sleep(2); sum(x) }, .parallel=TRUE)))
  ind V1
1   2 25
2   3 55
   user  system elapsed 
   0.02    0.00    2.02 
Shane
  • 98,550
  • 35
  • 224
  • 217
6

Have you registered a parallel backend to foreach ?

You may need to read up on use of foreach before you use it with plyr.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • I looked at the function list but couldn't figure it out. I also looked at the vignette but could not find the terms "backend" nor "register" =) I was hoping someone could just post a simple, complete working example. – Suraj Jul 21 '11 at 17:39
  • @SFun28 - Have you checked out `vignette("gettingstartedMC")`? – joran Jul 21 '11 at 17:40
  • Added "on Windows" to my title...seems like its possible on 32-bit with experimental multicore package, but not available on 64-bit. – Suraj Jul 21 '11 at 17:57
  • The `foreach` package support several backends for parallisation; you may try `doSMP` on Windows as well as the older / simpler `snow` package via `doSnow`. This is all non-trivial, but documented in `foreach` and related packages. – Dirk Eddelbuettel Jul 21 '11 at 18:08
  • 1
    Adding to Dirk's comments, I've enjoyed using snowfall package (if you're into that sort of thing). – Roman Luštrik Jul 21 '11 at 22:12
  • Roman, this is getting a little old. You don't need to parrot 'snowfall' each time someone asks for 'parallel with R' especially as snowfall, nice as it is, does not have a 'foreach' adapter -- at least AFAIK. – Dirk Eddelbuettel Jul 21 '11 at 22:16
  • I assist @Roman. Snowfall on windows is just so easy to use. Almost no dealing with any initialization/registration! – Henrik Jul 22 '11 at 13:05
  • All good and well, Henrik, but please show us how snowfall can help with 'ddply in parallel'. – Dirk Eddelbuettel Jul 22 '11 at 13:08
  • And there comes the problem. As far as I know, no chance at all. – Henrik Jul 22 '11 at 13:12
  • 1
    Per this post it seems like there will might be a fix in the next version of plyr. I've emailed Hadley to get the scoop: http://stackoverflow.com/questions/5588914/domc-vs-dosnow-vs-dosmp-vs-dompi-why-arent-the-various-parallel-backends-for-f – Suraj Jul 22 '11 at 13:57
6

A. I've been communicating with Hadley and there are no plans in the immediate future to fix this bug. The fix itself can be attempted by anyone. Here are some tips I received from Hadley:

"It's relatively easy at the simplest level - you just need to pass a .export argument to foreach. Ideally, plyr would figure out what to export automatically, but in the mean time, modifying .parallel to take a list of arguments to foreach (instead of just T/F) would be a big step. Start with llply, and if you can get that working, it's fairly trivial to get all the other functions working too."

B. I highly recommend snow and doSNOW to get parallel foreach to work on Windows. The other parallel backends either: 1. don't support Windows 2. don't work on 64-bit Windows 3. are supposed to work on Windows but are too buggy. snow/doSNOW was the the only solution that worked "out-of-the-box"

C. good luck!

Community
  • 1
  • 1
Suraj
  • 35,905
  • 47
  • 139
  • 250
4

On Unix environments, you can do this using the doMC package and its function registerDoMC()

> registerDoMC()
> example <- ddply(..., .parallel=TRUE)
user592419
  • 5,103
  • 9
  • 42
  • 67