Make R package easy to update with new files from users

Question

First let me explain that I come from the Python world, where I can do what I want like this in the shell:

$ export PYTHONPATH=~/myroot
$ mkdir -p ~/myroot/mypkg
$ touch ~/myroot/mypkg/__init__.py # this is the one bit of "magic" for Python
$ echo 'hello = "world"' > ~/myroot/mypkg/mymodule.py

Then in Python:

>>> import mypkg.mymodule
>>> mypkg.mymodule.hello
'world'

What I did there was to create a package which is easily extended by other users. I can check in ~/myroot/mypkg to source control and other users can later add modules to it using just a text editor. Now I want to do the equivalent thing in R. Here's what I have so far:

$ export R_LIBS=~/myR # already this is bad: it makes install.packages() put things here!
$ mkdir -p ~/myR
$ echo 'hello = "world"' > /tmp/mycode.R

Now in R:

> package.skeleton(name="mypkg", code_files="/tmp/mycode.R")

Now back to the shell:

$ R CMD build mypkg
$ R CMD INSTALL mypkg

Now back to R:

> library(mypkg)
> hello
"world"

So that works. But now how do my colleagues add new modules to this package? I want them to be able to just go in and add a new R file, but it seems like we then have to redo the entire package building process, which is tedious. And it seems like we will then end up checking in many generated files. But most importantly, where did the code go? Once I did CMD INSTALL, R knew what my code was, but it did not put the literal text (from mycode.R) anywhere under $R_LIBS (did it "compile" the code? I'm not sure).

Previously we would just source() our "modules" but this is not very good because it reloads the code every time, so indirect (transitive) dependencies end up reloading stuff that is already loaded.

My question is, how do people manage simple, in-house, collaboratively edited, source-controlled, non-binary, non-compiled, shared code in R?

I'm using R 3.1.1 on Linux. If the solution works on Windows too that would be nice.

What was wrong with "source()"ing? You would just source once per session right? I don't understand why you would do that more than once. R doesn't expect packages to be dynamic. I mean you can add shared folders to the `.libPaths()` so you all have the same packages. But the concept of everyone being able to add files at any time to a shared package isn't really an R paradigm. — MrFlick, Jul 31 '14 at 03:27
`source()` reloads all the code every time. This is massively inefficient when you have a bunch of R files with interdependencies, like A->B, B->C, A->C will load C twice. And loading R files can be very slow, e.g. if they use setRefClass (we have one such file which takes several seconds to load, just for one file with dozens of classes). — John Zwinck, Jul 31 '14 at 03:33
I still don't understand why you are calling it more than once. But you could could create a loader script you could source. Then after you source a file, you can set an `option()` value to say it's been loaded and then check for that value so you don't load again. A lot like how you might use c++ header files. — MrFlick, Jul 31 '14 at 03:35
A depends on B and C directly. B depends on C. This means A depends on C twice, so if we use `source()` everywhere, C will be loaded twice. I am starting to think like you say, that we may have to implement our own sourcing function which remembers the names of modules already sourced to avoid multiple loading. It is just surprising that R does not have anything like this already (every other language I have used does have, except Bash). — John Zwinck, Jul 31 '14 at 03:42
Use the environment argument to source, so that everything is loaded into custom environments. The code being sourced can check if its environment already exists and, if so, note reload all of its code. — Thomas, Jul 31 '14 at 05:56

score 0 · Accepted Answer · answered Aug 04 '14 at 06:58

It seems R has nothing like Python's import statement, so I made my own. Just put this in a file like import.r and source it via your $R_PROFILE.

# this is sort of like Python's import statement, and lets us avoid redundant sourcing

.imports <- c("import") # module names imported so far, to avoid redundant imports (never import ourselves)

.importScriptPath <- function() {
  # returns the path of the executing script
  # see http://stackoverflow.com/questions/1815606

  # this will only work if the caller was loaded with source()
  filePath <- sys.frame(2)$ofile

  # if the caller was not loaded with source(), use the main script path
  if (length(filePath) == 0) {
    argv <- commandArgs(trailingOnly = FALSE)
    filePath <- substring(argv[grep("--file=", argv)], 8)
  }

  return (dirname(filePath))
}

import <- function(module) {
  # locates the given module (character or token), calls source() on it, and does nothing on subsequent calls

  module <- as.character(substitute(module)) # support import(foo) not only import("foo")

  if (module %in% .imports) {
    return(invisible())
  }

  moduleFilename <- paste0(gsub("\\.", "/", module), ".r") # allow import(foo.bar) as import("foo/bar")
  importPaths <- c(.importScriptPath()) # add more search paths here as desired

  for (importPath in importPaths) {
    modulePath <- file.path(importPath, moduleFilename)
    if (file.exists(modulePath)) {
      source(modulePath)
      .imports <<- append(.imports, module) # <<- updates the global variable so we skip it next time
      return(invisible())
    }
  }

  # last chance: try to load module as a standard library
  suppressPackageStartupMessages(library(module, character.only = TRUE))
  .imports <<- append(.imports, module)
  return(invisible())
}

Make R package easy to update with new files from users

1 Answers1

Linked