8

The following (simplified) script works fine on the master node of a unix cluster (4 virtual cores).

library(foreach)
library(doParallel)

nc = detectCores()
cl = makeCluster(nc)
registerDoParallel(cl)

foreach(i = 1:nrow(data_frame_1), .packages = c("package_1","package_2"), .export = c("variable_1","variable_2"))  %dopar% {     

    row_temp = data_frame_1[i,]
    function(argument_1 = row_temp, argument_2 = variable_1, argument_3 = variable_2)

}

stopCluster(cl)

I would like to take advantage of the 16 nodes in the cluster (16 * 4 virtual cores in total).

I guess all I need to do is change the parallel backend specified by makeCluster. But how should I do that? The documentation is not very clear.

Based on this quite old (2013) post http://www.r-bloggers.com/the-wonders-of-foreach/ it seems that I should change the default type (sock or MPI - which one- would that work on unix?)

EDIT

From this vignette by the authors of foreach:

By default, doParallel uses multicore functionality on Unix-like systems and snow functionality on Windows. Note that the multicore functionality only runs tasks on a single computer, not a cluster of computers. However, you can use the snow functionality to execute on a cluster, using Unix-like operating systems, Windows, or even a combination.

What does you can use the snow functionality mean? How should I do that?

Antoine
  • 1,649
  • 4
  • 23
  • 50
  • 1
    I am not using a for loop... – Antoine Apr 22 '16 at 12:47
  • I would recommend downloading Revolution R open, which has improved statistical libraries and multi-core support. [check my answer on this post for more information about RRO](http://stackoverflow.com/questions/31900708/how-can-i-get-r-to-use-more-cpu-usage/33996564#33996564). This is not directly related to your question, but it will speed up any mathematical calculations and package which can use multi cores. – Bas Apr 22 '16 at 12:49

2 Answers2

9

The parallel package is a merger of the multicore and snow packages, but if you want to run on multiple nodes, you have to make use of the "snow functionality" in parallel (that is, the part of parallel that was derived from snow). Practically speaking, that means you need to call makeCluster with the "type" argument set to either "PSOCK", "SOCK", "MPI" or "NWS" because those are the only cluster types supported by the current version of parallel that support execution on multiple nodes. If you're using a cluster that is managed by knowledgeable HPC sysadmins, you should use "MPI", otherwise it may be easier to use "PSOCK" (or "SOCK" if you have a particular reason to use the "snow" package).

If you choose to create an "MPI" cluster, you should execute the script via R using the mpirun command with the "-n 1" option, and the first argument to makeCluster set to the number of workers that should be spawned. (If you don't know what that means, you may not want to use this approach.)

If you choose to create a "PSOCK" or "SOCK" cluster, the first argument to makeCluster must be a vector of hostnames, and makeCluster will start workers on those nodes via the "ssh" command when makeCluster is executed. That means you must have ssh daemons running on all of the specified hosts.

I've written much more on this subject elsewhere, but hopefully this will help you get started.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • it was a pain to install the `Rmpi` package but I finally succeeded. When I follow the recommendation from the answer below and create the parallel backend with `makeCluster(16, type = "MPI")` the R session is terminated and I get a seg fault error (on the linux shell). So it seems that the only way to make it work is to use `mpirun` as you said but that seems complicated – Antoine Apr 29 '16 at 09:02
  • @Antoine If R seg faults, I would guess that Rmpi isn't correctly installed. Did you try using a PSOCK cluster with a vector of hostnames, or did you run into ssh problems? – Steve Weston Apr 29 '16 at 13:17
  • 1
    @Antoine You *have* to execute your R script via mpirun if you want to use multiple nodes using Rmpi. If you execute `makeCluster(16, type = "MPI")` from your R script without using mpirun, all 16 workers will start on the local machine. I suppose that could even be the cause of the seg fault. – Steve Weston Apr 29 '16 at 13:26
  • hum I see. thanks much for the feedback I will try to learn how to use `mpirun` – Antoine Apr 29 '16 at 14:06
  • 3
    @Antoine If you're using Open MPI (which I recommend for Rmpi), you can use a command such as `mpirun -n 1 -H "localhost,n1,n2" R --slave -f script.R` where "n1" and "n2" are legal hostnames. This causes mpirun to execute your script only on "localhost", but enables Rmpi to spawn workers on hosts "n1" and "n2" as well. Start with some simple tests, such as using `makeCluster(2, type="MPI")`. – Steve Weston Apr 29 '16 at 14:26
  • thank you very much this is very helpful! Let me try, I'll keep you posted – Antoine Apr 29 '16 at 14:34
1

Here's a partial answer that may send you in the right direction

Based on this quite old (2013) post http://www.r-bloggers.com/the-wonders-of-foreach/ it seems that I should change the default type (fork to MPI but why? would that work on unix?)

fork is a way of spawning background processes on POSIX system. on a single node with n cores, you can spawn n processes in parallel and do work. this doesn't work across multiple machines as they don't share memory. you need a way to get data between them.

MPI is a portable way to communicate between clusters of nodes. An MPI cluster can work across nodes.

What does you can use the snow functionality mean? How should I do that?

snow is a separate package. To make a 16 node MPI cluster with snow, do cl <- makeCluster(16, type = "MPI") but you need to be running R in the right environment, as described Steve Weston's answer and in his answer to a similar question here. (Once you get it running you may also need to modify your loop to use 4 cores on each node.)

jaimedash
  • 2,683
  • 17
  • 30
  • When I execute the command `makeCluster(16, type = "MPI")` the R session is terminated and I get a seg fault error (on the linux shell). Do you know how to fix that? – Antoine Apr 29 '16 at 09:03