20

I am trying to implement a reducer for Hadoop Streaming using R. However, I need to figure out a way to access certain libraries that are not built in R, dplyr..etc. Based on my research seems like there are two approaches:

(1) In the reducer code, install the required libraries to a temporary folder and they will be disposed when the session is done, like this:

.libPaths(c(.libPaths(), temp <- tempdir()))
install.packages("dplyr", lib=temp, repos='http://cran.us.r-project.org')
library(dplyr)
...

However, this approach will have a dramatic overhead depending on how many libraries you are trying to install. So most of the time will be wasted on installing libraries(sophisticated libraries like dplyr has tons of dependencies which will take minutes to install on a vanilla R session).

So sounds like I need to install it before hand, which leads us to approach2.

(2) My cluster is fairly big. And I have to use some tool like Ansible to make it work. So I prefer to have one Linux shell command to install the library. I have seen R CMD INSTALL... before, however, it feels like will only install packages from source file instead of doing install.packages() in R console, figure out the mirror, pull the source file, install it in one command.

Can anyone show me how to use one command line in shell to non-interactively install a R package? (sorry for this much background knowledge, if anyone thinks I am not even following the right phylosophy, feel free to leave in the comment how this whole cluster R package should be managed.)

B.Mr.W.
  • 18,910
  • 35
  • 114
  • 178

2 Answers2

59

tl;dr

Rscript -e 'install.packages("drat", repos="https://cloud.r-project.org")'

You mentioned you are trying to install dplyr into custom lib location on your disk. Be aware that dplyr package does not support that. You can read more in dplyr#4641.


Moreover if you are installing private package published in internal CRAN-like repository (created by drat or tools::write_PACKAGES), you can easily combine repos argument and resolve dependencies from CRAN automatically.

Rscript -e 'install.packages("priv.pkg", repos=c("cran.priv","https://cloud.r-project.org"))'

This is very handy feature of R repositories, although for production use I would recommend to cache packages from CRAN locally, and use those, so you will never be surprised by a breaking changes in your dependencies. For quality information about handling R in production I suggest to look into talk by Wit Jakuczun at WhyR2019 How to make R great for machine learning in (not only) Enterprise: slides, video.

jangorecki
  • 16,384
  • 4
  • 79
  • 160
  • 2
    I don't understand your point -- `install.r` comes with littler and we tend to just install a softlink in `/usr/local/bin`. Then it is just `install.r drat`. And for what it is worth, we use `install.r` and `install2.r` (which has command-line option support) a lot in rocker for just this. See eg [this Dockerfile](https://github.com/rocker-org/hadleyverse/blob/master/Dockerfile). – Dirk Eddelbuettel Sep 03 '15 at 22:10
  • 3
    It's always better not to have to install new software just to do basic stuff like installing packages. – jcubic Apr 08 '19 at 14:09
  • This is a really great way when pushing the packages to 100+ computers. Really straightforward and easy to script / automate. – Bastion Feb 11 '20 at 22:36
  • @Bastion If you do such a deployment then best to cache this package (or pkgs tree, if it has deps) and install from local CRAN-line repo. Otherwise you may eventually ended up having different version installed on difference machines. You can use [`mirror.packages`](https://github.com/Rdatatable/data.table/blob/master/.ci/ci.R) for that. – jangorecki Feb 12 '20 at 12:29
  • In GNU/Linux machines, you need to run it with 'sudo': sudo Rscript -e 'install.packages("drat", repos="https://cloud.r-project.org")' – woody70 May 09 '21 at 07:25
3

You may find littler useful. It is a command-line front-end / variant of R (which uses the R-embedding interface).

I use the install.r script all the time to install package from the shell. There is a second variant with more command-line argument parsing but it has an added dependency.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • The links to the install.r scripts are dead. – Radek Liska May 16 '19 at 08:51
  • 1
    I had omitted a directory by mistake. Now fixed. – Dirk Eddelbuettel May 16 '19 at 14:25
  • The second variant still points to the first variant, could you please change that, too? I'd edit it on my own, but I'm not allowed to do such small edits (?). – Radek Liska May 17 '19 at 20:14
  • Done, added the missing `2` for `install2.r`. I also realized that the links may have broken when I changed the "pure source repo" to "repo for a CRAN package" which required the `inst/examples` directory to have `examples/` installed (R details). Thanks for checking. – Dirk Eddelbuettel May 17 '19 at 20:19