11

I'm actually evaluate different solution to enhance/explore reproductibility in my R/Python scientific workflow : data with reproductible analysis (plot, analysis) and paper.

There is, as you know, two big linux flavours offer some solutions : Nix and Guix

In nix, the way commonly described to develop with R is, for example, using rWrapper and rPackages :

pkgs.rWrapper.override{ packages = with pkgs.rPackages; [tidyverse rmarkdown]; };

My problem is (not so...) simple, like Python, R is well known to be a nightmare in term of reproducibility, even at middle term. For fun, you could try to run a ggplot2 code from 2 years with a recent version of R...

In order to propose a flake that produce the same result from the same data for a scientific paper, i'm interested to fix in derivation the version of R and the version of R packages used to compute analysis or plot.

{
description = "Generate R result from simulation";

inputs = {
    nixpkgs.url = "nixpkgs/nixos-20.09";
    utils.url = "github:numtide/flake-utils";

};

outputs = {self, nixpkgs, utils, mach-nix } : (utils.lib.eachDefaultSystem
    (system :
    let
        pkgs = nixpkgs.legacyPackages.${system};
        REnv = pkgs.rWrapper.override{ packages = with pkgs.rPackages; [tidyverse rmarkdown]; };

        buildRScripts = { stdenv, fetch,... }: stdenv.mkDerivation {
        name = "myscript";
        src = self;
        nativeBuildInputs = [ REnv ];
        dontBuild = true;
        buildInputs = [ pkgs.pandoc pkgs.unzip ];
 
        installPhase=''
            mkdir $out
            cd $out
            ${REnv}/bin/Rscript -e 'rmarkdown::render("test.Rmd")
        '';
  in {
      defaultPackage = self.packages.${system}.buildRScripts;
     }
  ));}

For example, how could i define more precisly that i want to use, to compile my test.Rmd, only the tidyverse 1.3.1 with R 4.1.O ? Even in 5 years ?

I found that Guix show different available packages/versions of R and tidyverse :

Version needed by tidyverse.1.3.1 are clearly presented :

With rPackages in Nix i search a way to achieve something similar, ie. a way to refer explicitly to version of R or R packages into derivation, but i didn't found it.

With rPackages here nix developper already offering a great fundation, but perhaps we need more ...

How we could, collectively achieve a better reproducibility on R packages with Nix ? I'm interested by any ideas ?

Perhaps we could fetch sources of packages directly from the cran archive and compile it ? For example with tidyverse :

Ps : i know that Nix and Guix are each partners with https://archive.softwareheritage.org/, a great way to archive and call cran package :

Ps : answer could also be added to https://nixos.wiki/wiki/R

Update 1

After discussion with some great people on nix discord, i understand that nix doesn't need version because flake.nix + flake.lock store hash (see nix flake metadata) that link my build and download with a very specific commit on nixpkgs.

But that don't solve :

  • the problem of the tar.gz sources linked/needed by this packages declared at this very specific commit by RPackages ? I suppose software heritage will help on this point ?
  • the common problem of incompatibility between some R version, and R version of packages. For example, you write a code with R 3.0.0 and tidyverse 1.2.3, you update your R version because some other packages need an update and only works with dependency available with R 3.2.0, but ahum, tidyverse 1.2.3 don't exist for R 3.2.0 ... Fixing version and access to old tar.gz resolve part of this problem i suppose.

How we define something like this using nix ?

Update 2

It seems someone build an unofficial index to help people searching old version of package Ex with tidyverse : https://lazamar.co.uk/nix-versions/?channel=nixpkgs-unstable&package=r-tidyverse

Thanks @dram for link and discussion on this.

reyman64
  • 523
  • 4
  • 34
  • 73
  • 1
    I believe I understand your goal, ("For example, how could i define more precisly that i want to use, to compile my test.Rmd, only the tidyverse 1.3.1 with R 4.1.O ? Even in 5 years") but there are a lot of 'moving parts' to take into consideration. The best approach I've found is to containerize everything (the code, the inputs/outputs, the software versions - everything) with e.g. docker / https://github.com/ThinkR-open/devindocker. I don't use nix/guix, but before you start developing a solution, I think it would be good to define the advantages your flake/nix approach would provide. – jared_mamrot Jun 02 '21 at 03:28
  • 5
    what about [renv](https://rstudio.github.io/renv/)? – captcoma Jun 02 '21 at 08:33
  • @captcoma Yeah renv is interesting, but that don't solve all problem, see Caveat here : https://cran.r-project.org/web/packages/renv/vignettes/renv.html .I take the salient example of the doc, for ex. to work Rmarkdown package need pandoc system package installed to work. Pandoc is not bundled with Rmarkdown. So even if you have the R package you need all the linked system packages with good version to work (pandoc x.x for rmarkdown x.x) – reyman64 Jun 05 '21 at 13:15
  • 2
    @jared_mamrot Docker is one path, but Docker is one big black box with lot of internal dependency. As an regular user of Docker, there are lot of caveats linked to buildfile ... Guix and Nix solve that with declarative package manager in a better way. – reyman64 Jun 05 '21 at 13:18
  • 1
    I second that you should definitely use `renv`. You should also declare your system dependencies on `nix` (or whatever). This solves also the pandoc problem. – nicola Jun 21 '21 at 11:40

0 Answers0