I am writing a protocol for a reproducible analysis using an in-house package "MyPKG". Each user will supply their own input files; other than the inputs, the analyses should be run under the same conditions. (e.g. so that we can infer that different results are due to different input files).
MyPKG is under development, so library(MyPKG)
will load whichever was the last version that the user compiled in their local library. It will also load any dependencies found in their local libraries.
But I want everyone to use a specific version (MyPKG_3.14) for this analysis while still allowing development of more recent versions. If I understand correctly, "R --vanilla" will load the same dependencies for everyone.
Once we are done, we will save the working environment as a VM to maintain a stable reproducible environment. So a temporary (6 month) solution will suffice.
I have come up with two potential solutions, but am not sure if either is sufficient.
ask the server admin to install MyPKG_3.14 into the default R path and then provide the following code in the protocol:
R --vanilla library(MyPKG) ....
or
compile MyPKG_3.14 in a specific library, e.g. lib.loc = "/home/share/lib/R/MyPKG_3.14", and then provide
R --vanilla library(MyPKG)
- Are both of these approaches sufficient to ensure that everyone is running the same version?
- Is one preferable to the other?
- Are there other unforseen issues that may arise?
- Is there a preferred option for standardising the multiple analyses?
- Should I include a test of the output of
SessionInfo()
? - Would it be better to create a single account on the server for everyone to use?