My Problem:
I have an R script myscript.R
that uses a configuration file, e.g. config.xml
, what is the best way to submit such a script to a job scheduler (e.g., using qsub)?
I would like to be able to use the script and file in the same way that I would use, e.g., a C or Fortran executable, which is embedded in a bash script.
How I currently use FORTRAN:
Here is an example of the approach that I use with a compiled Fortran executable fex
like the following that I will call fscript.sh
:
!#/bin/bash/
mpirun [arguments] "fex" -f $1
The above fscript.sh
can be sent to a cluster with instructions to read the config file like this:
qsub [arguments] fscript.sh 1 config.xml
How I currently use R in an analogous way:
To run R in an analogous way, I am using a bash script rscript.sh
#!/bin/bash
CONFIG=$1
env $CONFIG R --vanilla < myscript.R
This can be run at the command line, e.g.
qsub [arguments] rscript.sh config.xml
Where the rscript.R
contains something like
library(XML)
config <- Sys.getenv("CONFIG")
config <- xmlList(xmlParse(config.xml))
myfunction(config)
My Questions
- Would Rscript or compiler provide a more robust approach than my current use of bash?
- Under which conditions would one be more appropriate than the other (What are the pros and cons)?
- How would I pass a configuration file in either case?
What I have done so far
In addition to coming up with the bash script rscript.sh
described above, I have read through tutorials and some documentation for Rscript
and compiler, but it is not clear to me if these are the contexts in which one would be preferred over the other. Also, it is not clear the best way to pass a configuration file in either context.
This questions is related to others, e.g., What are the ways to create an executable from R program, Does an R compiler exist?. However, I do not think that is essential to use compiled code.