5

I am writing a protocol for a reproducible analysis using an in-house package "MyPKG". Each user will supply their own input files; other than the inputs, the analyses should be run under the same conditions. (e.g. so that we can infer that different results are due to different input files).

MyPKG is under development, so library(MyPKG) will load whichever was the last version that the user compiled in their local library. It will also load any dependencies found in their local libraries.

But I want everyone to use a specific version (MyPKG_3.14) for this analysis while still allowing development of more recent versions. If I understand correctly, "R --vanilla" will load the same dependencies for everyone.

Once we are done, we will save the working environment as a VM to maintain a stable reproducible environment. So a temporary (6 month) solution will suffice.

I have come up with two potential solutions, but am not sure if either is sufficient.

  1. ask the server admin to install MyPKG_3.14 into the default R path and then provide the following code in the protocol:

    R --vanilla
    library(MyPKG)
    ....
    

    or

  2. compile MyPKG_3.14 in a specific library, e.g. lib.loc = "/home/share/lib/R/MyPKG_3.14", and then provide

    R --vanilla
    library(MyPKG)
    

  • Are both of these approaches sufficient to ensure that everyone is running the same version?
  • Is one preferable to the other?
  • Are there other unforseen issues that may arise?
  • Is there a preferred option for standardising the multiple analyses?
  • Should I include a test of the output of SessionInfo()?
  • Would it be better to create a single account on the server for everyone to use?
David LeBauer
  • 31,011
  • 31
  • 115
  • 189
  • Great question. I working on this issue as part of a larger project and plan to release a library that will generate provenance traces after any execution. It would be easy to compare two traces and see if the difference is just new data or new libraries and one could alter as needed if different results show up. Email me for more details. – Maiasaura Sep 20 '12 at 19:25
  • @Maiasaura I don't see your email. Does it start with kram? – David LeBauer Sep 20 '12 at 19:30
  • Would it suffice to roll out a standard `.Rprofile.site` file to all users which includes a line like `install.packages('mypathto/MyPKG_3.14'); library(MyPKG_3.14)` ? – Carl Witthoft Sep 20 '12 at 19:36
  • @CarlWitthoft [I doubt it](http://stackoverflow.com/a/11531086/967840) because R reads Rprofile when it installs a package which tells it to install a package which causes it to read Rprofile ... – GSee Sep 20 '12 at 20:52

1 Answers1

1

Couple of points:

  • Use system-wide installations of packages, e.g. the Debian / Ubuntu binary for R (incl the CRAN ports) will try to use /usr/local/lib/R/site-library (which users can install too if added to group owning the directory). That way everybody gets the same version
  • Use system-wide configuration, e.g. prefer $R_HOME/etc/ over the dotfiles below ~/. For the same reason, the Debian / Ubuntu package offers softlinks in /etc/R/
  • Use R's facilties to query its packages (eg installed.packages()) to report packages and versions.
  • Use, where available, OS-level facilities to query OS release and version. This, however, is less standardized.

Regarding the last point my box at home says

> edd@max:~$ lsb_release -a | tail -4
> Distributor ID: Ubuntu
> Description:    Ubuntu 12.04.1 LTS
> Release:        12.04
> Codename:       precise
> edd@max:~$ 

which is a start.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • +1 very helpful advice you gave me recently to remove "~/R/x86_64-pc-linux-gnu-library/2.15" – GSee Sep 20 '12 at 20:57