2

I need to run thousands* of models on 15 machines (each of 4 cores), all Windows. I started to learn parallel, snow and snowfall packages and read a bunch of intro's, but they mainly focus on the setup of the master. There is only a little information on how to set up the worker (slave) nodes on Windows. The information is often contradictory: some say that SOCK cluster is practically the easiest way to go, others claim that SOCK cluster setup is complicated on Windows (sshd setup) and the best way to go is MPI.

So, what is an easiest way to install slave nodes on Windows? MPI, PVM, SOCK or NWS? My, possibly naive ideas were (listed by priority):

  1. To use all 4 cores on the slave nodes (required).
  2. Ideally, I need only R with some packages and a slave R script or R function that would listen on some port and wait for tasks from master.
  3. Ideally, nodes can be added/removed dynamically from the cluster.
  4. Ideally, the slaves would connect to the master - so I wouldn't have to list all the slaves IP's in configuration of the master.

Only 1 is 100% required, 2-4 are "would be good". Is it too naive to request?

I am sorry but I have not been able to figure this out from the available docs and tutorials. I would be grateful if you point me out to the right source.


* Note that each of those thousands of models will take at least 7 minutes, so there won't be a big communication overhead.
Community
  • 1
  • 1
Tomas
  • 57,621
  • 49
  • 238
  • 373
  • 1
    R will handle workers. You just need to export all the functions and packages to them using ready made tools (in snowfall it's `sfExport` and `sfLibrary`). – Roman Luštrik Mar 24 '14 at 13:48
  • @RomanLuštrik, *"R will handle workers"* great to know, but the question was *how* - *how* shall I set them up? What cluster type you speak about? – Tomas Mar 24 '14 at 14:26
  • 1
    Check out this for some example code and comments at the bottom about MPI vs. SOCK http://www.ics.uci.edu/~vqnguyen/talks/ParallelComputingSeminaR.pdf – jpd527 Mar 24 '14 at 15:48
  • Oh, sorry, forgot to include. I use snow on `SOCK`. – Roman Luštrik Mar 24 '14 at 17:19
  • I don't think that the two answers you cited are contradictory. With one machine a SOCK cluster is pretty easy because ssh isn't used in that case. With multiple machines an MPI cluster is easier unless you are a Windows ssh expert. – Steve Weston Mar 25 '14 at 18:25

1 Answers1

2

It's a shame how all these APIs (like parallel/snow/snowfall) are complex to work with, a lots of docs but not what you need... I have found an API which is very simple and goes straight to the ideas I sketched!! It is redis and doRedis R package (as recommended here). Finally a very simple tutorial is present! Just modified a bit and got this:

The workers need only R, doRedis package and this script:

require(doRedis)    
redisWorker('jobs', '10.0.0.7') # IP of the server

The master needs redis server running (installed the experimental windows binaries for Windows), and this R code:

require(doRedis)
registerDoRedis('jobs')
foreach(j=1:10,.combine=sum,.multicombine=TRUE) %dopar%
    ... # whatever you need to run
removeQueue('jobs')

Adding/removing workers is fully dynamic, no need to specify IPs at master, automatic "load balanancing", simple and no need for tons of docs! This solution fulfills all the requirements and even more - as stated in ?registerDoRedis:

The doRedis parallel back end tolerates faults among the worker processes and automatically resubmits failed tasks.

I don't know how complex this would be using the parallel/snow/snowfall with SOCKS/MPI/PVM/NWS, if it would be possible at all, but I guess very complex...

The only disadvantages of using redis I found:

Community
  • 1
  • 1
Tomas
  • 57,621
  • 49
  • 238
  • 373
  • doRedis indeed seems to be really an easy solution, however, I doubt that it does support multicore on the slaves. I tried to combine doRedis and doSNOW but failed. do you have any clues how to get the slaves running on multicore? – Jens Mar 11 '15 at 14:48