R package development best practices: using system() command?

Question

I'm developing a new R package to release to CRAN and would like to invoke the system() command directly within its source code. For example, I would like to use the gzip utility directly within my R package:

write.csv(mydat, "mydat.csv")
system("gzip mydat.csv", wait=FALSE)

Even more importantly, I would like to leverage other existing command-line utilities directly within my R package. And by command-line utilities, I mean actual large command-line software programs that are not trivial to rewrite in R.

So my question is: What are some best practices for specifying the usage of external (not R) command-line libraries during the development of an R package?

For example, the Imports and Depends fields in an R package DESCRIPTION file are only good for specifying the usage of existing R libraries within your R package. It would be a nuisance for users to have to manually install some existing non-R command-line library by using a package manager (e.g., brew), and this would go against best practices of self-contained work within an R Studio IDE. Besides, there is no guarantee that such a roundabout approach would work in a reproducible fashion, due to the difficulty of properly matching full paths to the command-line executable, coordinating with the R Studio IDE, etc.

Likewise, using tools such as https://cran.r-project.org/web/packages/ssh.utils/index.html will only serve basic command-line needs within the R environment, and hence does not apply to the needs of using large command-line software programs.

Note: The R package that I'm developing is not for personal use. It is intended for public release to CRAN and, hence, should comply with their checks. However, I could not find any specification from CRAN regarding the use of the system() command, particularly in the context of leveraging actual large command-line software programs that are not trivial to rewrite in R.

In the `DESCRIPTION` file of your package, you can specify the `SystemRequirements` section with the needed dependencies. You should then write a configure script that tests whether the system has them. However, I don't think it's a good practice to call `system`. If you rely on an external library, a cleaner way should be to wrap it so it can be called directly from R. — nicola, Apr 17 '16 at 20:48
Spot on. And what we do in `x13binary` ensuring the presence of the tool we require (and provide). — Dirk Eddelbuettel, Apr 17 '16 at 20:58
@nicola When you say: "However, I don't think it's a good practice to call system. If you rely on an external library, a cleaner way should be to wrap it so it can be called directly from R." ... how do you wrap code from an existing library (assuming it's not an R library) without calling `system()`? — warship, Apr 17 '16 at 21:28
You write C (or C++) code using R/C interfaces like `.Call` or `.External` or the `Rcpp` package. See this link http://adv-r.had.co.nz/C-interface.html to get started on how to call C from R. See for instance the `rgeos`, `rgdal` or other packages that do exactly this: wrap an external library so it can be called from R. — nicola, Apr 17 '16 at 22:25
@nicola: Good comment, but no need to refer to external documentation as Rcpp has plenty of that itself, and no need to look at difficult/obscure packages like `rgeos` or `gdal` which do not use Rcpp. Plenty of recent, simple, well-written packages using Rcpp from the likes of Jeroen, Hadley, Oliver, Bob Rudis, ... — Dirk Eddelbuettel, Apr 17 '16 at 22:46
@DirkEddelbuettel You are absolutely right. I' more used with the "bare" C interface and those packages were the first that came into my mind. I guess hundreds of libraries are wrapped in R using `Rcpp`, so OP will for sure be able to accomplish their goal using just `Rcpp` tools and doc (especially if they got some kind of experience in C++). However, I'm under the impression that OP is hoping for a copy/paste kind of solution that didn't require basically any work. I guess they didn't go over your answer with the needed care. — nicola, Apr 18 '16 at 07:31

score 0 · Answer 1 · edited May 23 '17 at 10:28

0

I would like to use the gzip utility directly within my R package

That is a code smell. Your package then needs to determine by means of configure (or similar) if such programs exist. So why bother? In this example, and on my box:

edd@don:~$ grep  GZIP /etc/R/Renviron
R_GZIPCMD=${R_GZIPCMD-'/bin/gzip -n'}
edd@don:~$

You have access to it via most file-saving commands such as saveRDS(), the gzcon() and gzfile() functions and so on. See this older answer of mine.

For truly external programs you can rely on system(). See Christoph's seasonal package relying on our underlying x13binary binary package.

edited May 23 '17 at 10:28

Community

1
1

answered Apr 17 '16 at 20:26

Dirk Eddelbuettel

360,940
56
644
725

The `x13binary` package is an R package. I'm explicitly referring to command-line software packages (not necessarily written in R) that are NOT available in the R ecosystem (e.g., R Studio IDE). – warship Apr 17 '16 at 20:40
Please pay a bit more attention to detail. The [x13binary](https://cloud.r-project.org/web/packages/x13binary/index.html) _package_ supplies (via custom installer code) a _binary_ irrespective of operating system and platform to be accessed from R via `system()` --- exactly what you ask for here -- but doing it _carefully_ and _portably_ via R wrapper code. All of this may well be a little harder than you currently think it is. – Dirk Eddelbuettel Apr 17 '16 at 20:57

R package development best practices: using system() command?

1 Answers1