3

I've gotten accustomed to doing R jobs on a cluster with 32 cores per node. I am now on a cluster with 16 cores per node. I'd like to maintain (or improve) performance by using more than one node (as I had been doing) at a time.

As can be seen from my dummy sell script and dummy function (below), parallelization on a single node is really easy. Is it similarly easy to extend this to multiple nodes? If so, how would I modify my scripts?

R script:

library(plyr)
library(doMC)
registerDoMC(16)

dothisfunctionmanytimes = function(d){
    print(paste("my favorite number is",d$x,'and my favorite letter is',d$y))
}
d = expand.grid(1:1000,letters)
d_ply(.data=d,.fun=dothisfunctionmanytimes,.parallel=T)

Shell script:

#!/bin/sh
#PBS -N runR
#PBS -q normal
#PBS -l nodes=1:ppn=32
#PBS -l walltime=5:00:00
#PBS -j oe
#PBS -V
#PBS -M email
#PBS -m abe

. /etc/profile.d/modules.sh
module load R

#R_LIBS=/home/diag/opt/R/local/lib
R_LIBS_USER=${HOME}/R/x86_64-unknown-linux-gnu-library/3.0
OMP_NUM_THREADS=1

export R_LIBS R_LIBS_USER OMP_NUM_THREADS

cd $PBS_O_WORKDIR
R CMD BATCH script.R

(The shell script gets submitted by qsub script.sh)

generic_user
  • 3,430
  • 3
  • 32
  • 56
  • Have a look at http://stackoverflow.com/questions/17899756/initializing-mpi-cluster-with-snowfall-r . In general though, `snow` is probably the way forward, and MPI may be the best way within that for your purposes. – Nick Kennedy Aug 12 '15 at 20:36
  • You could also use the [BatchJobs package](https://cran.r-project.org/web/packages/BatchJobs/index.html) which would allow you to split into multiple torque jobs. – Lars Kotthoff Aug 12 '15 at 23:36
  • do you have a working answer to this question ? – ClementWalter Jan 25 '16 at 09:17

0 Answers0