22

In reorganising my code base I’d like to clean up my code sharing mechanism. So far I’m using source for lots of small, largely self-contained modules of functionality.

However, this approach suffers from a number of problems, among them

  • the lack of tests for circularity (accidental circular source chains),
  • complex syntax required to properly specify include paths (chdir=TRUE argument, hard-coded paths),
  • potential of name clashes (when redefining objects).

Ideally I’d like to get something alike to the Python module mechanism. The R package mechanism would be overkill here: I do not want to generate nested path hierarchies, multiple files with tons of metadata and manually build the package just to get a small, self-contained, reusable code module.

For now I’m using a code snippet which allows me to solve the first two problems mentioned above. The syntax for inclusion is like this:

import(functional)
import(io)
import(strings)

… and a module is defined as a simple source file which resides in the local path. The definition of import is straightforward but I cannot solve the third point: I want to import the module into a separate namespace but from what I see the namespace lookup mechanism is pretty hard-wired to packages. True, I could override `::` or getExportedValue and maybe asNamespace and isNamespace but that feels very dirty and has the potential of breaking other packages.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Can you expand on why adding each file's contents to a separate environment on the search path (as shown in the examples of `?sys.source`) is insufficient? – Joshua Ulrich Apr 03 '13 at 14:37
  • @Joshua That’s actually what I’m doing at the moment (my example was simplified) – I thought that having a way of explicitly qualifying the namespace was nice though. Of course I can do the same with `get` and `assign` but the syntax of `::` is quite a bit nicer. – Konrad Rudolph Apr 03 '13 at 14:43
  • I was confused because your `import` function doesn't do that. If you put each file's content in a separate environment on the search path, you can access a specific environment with the `$` operator (e.g. `strings$concatenate()`). – Joshua Ulrich Apr 03 '13 at 14:48
  • Related to: [Attaching a temporary namespace to the search path](http://stackoverflow.com/q/15620404/271616). – Joshua Ulrich Apr 03 '13 at 15:43
  • 6
    I think this needs more answers scolding the OP for wanting to avoid the overhead of creating a package ;=) – Josh O'Brien Apr 04 '13 at 16:00

6 Answers6

17

Here's a function that completely automates package creation, compilation, and reloading. As others have noted, the utility functions package.skeleton() and devtools::load_all() already get you almost all the way there. This just combines their functionality, using package.skeleton() to create the source directory in a temp directory that gets cleaned up when load_all() is done processing it.

All you need to do is point to the source files from which you want to read in functions, and give the package a name: import() does the rest for you.

import <- function(srcFiles, pkgName) {
    require(devtools)
    dd <- tempdir()
    on.exit(unlink(file.path(dd, pkgName), recursive=TRUE))
    package.skeleton(name=pkgName, path = dd, code_files=srcFiles)
    load_all(file.path(dd, pkgName))
}

## Create a couple of example source files
cat("bar <- function() {print('Hello World')}", file="bar.R")
cat("baz <- function() {print('Goodbye, cruel world.')}", file="baz.R")

## Try it out
import(srcFiles=c("bar.R", "baz.R"), pkgName="foo")

## Check that it worked
head(search())
# [1] ".GlobalEnv"        "package:foo"       "package:devtools"
# [4] "package:stats"     "package:graphics"  "package:grDevices"
bar()
# [1] "Hello World"
foo::baz()
# [1] "Goodbye, cruel world."
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • @hadley -- Could you please elaborate? What would be the equivalent of my `import(c("bar.R", "baz.R"), pkgName="foo")` using `devtools::create()`? – Josh O'Brien Apr 04 '13 at 14:11
  • `create(path); file.copy(srcFiles, file.path(path, "R"))` - not a big improvement, just avoids creating files that you never use. But for this scenario, you don't even need `create`. – hadley Apr 04 '13 at 19:05
  • 4
    @hadley, the nice thing about the function you call clumsy is that it doesn't sign a GPL-3 license on my behalf. – GSee Apr 04 '13 at 19:11
  • Hmm, this is actually almost perfect for me – with one blemish: it persistently installs the package on the system. I’d like this to be limited to the current session so that a restart of R can no longer load the ad-hoc created package without re- `import` -ing it. I cannot find an “uninstall” function in devtools. Is there a good way to achieve this? Furthermore, how do I make sure this action is performed? Simply stuffing it into `.Last` seems to be insufficient since `.Last` can be overwritten elsewhere. – Konrad Rudolph Apr 05 '13 at 09:21
  • @Gsee I didn't think about that - what would be a better default license? (Perhaps better to continue discussion at http://github.com/hadley/devtools/issues ? – hadley Apr 05 '13 at 11:03
  • @KonradRudolph -- Do you mean that a package **foo** gets installed in one of your libraries (as it would with `install.packages("foo")`)? I ask because that *doesn't* happen when I run the code above. (If you confirm that this is nonetheless happening on your system, by first removing the package manually or using `remove.packages()` and then rerunning my code, etc..., I've got some code that should fix it.) – Josh O'Brien Apr 05 '13 at 16:05
  • @JoshO'Brien Ah, no. I just assumed since I didn’t find anything in the documentation saying otherwise. I’m gonna try it now. /EDIT: Works as expected. Perfect! – Konrad Rudolph Apr 05 '13 at 16:08
  • 4
    @GSee Created issue regarding default license: https://github.com/hadley/devtools/issues/282 – Brian Diggs Apr 05 '13 at 20:57
  • @JoshO'Brien @hadley what if I created the package with `import(srcFiles=c("bar.R", "baz.R"), pkgName="foo")` and then I want to add one more file w/o having to add them all again, i.e., bax.R.... if I do `import(srcFiles=c("bax.R"), pkgName="foo")` it erases bar.R and baz.R because it re-imports it. Is there a solution to potentially update the package one or few files at a time? – Dnaiel Aug 26 '16 at 02:41
15

Konrad, in all seriousness, the answer to the demand

to get a small, self-contained, reusable code module

is to create a package. That gospel has been repeated numerous times here on SO, and in other places. You can in fact create minimal packages with minimal fuzz.

Also, after running

 setwd("/tmp")
 package.skeleton("konrad")

and removing the one temporary file, I am left with

 edd@max:/tmp$ tree konrad/
 konrad/
 ├── DESCRIPTION
 ├── man
 │   └── konrad-package.Rd
 └── NAMESPACE

 1 directory, 3 files
 edd@max:/tmp$ 

Is that really that onerous?

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • I find this unsatisfactory; the required effort to create, build and install a package for a two-function file is orders of magnitude greater than writing said file, and sourcing it. I’m not denying their versatility or usefulness, just that they are appropriate in *every* situation. – Konrad Rudolph Apr 03 '13 at 14:13
  • 5
    @KonradRudolph But if you just have a two-function file then you don't need to worry about circularity and it shouldn't be too hard to avoid collisions? But in all seriousness it really *is* incredibly simple to turn something into a package. Nobody is saying that you need to go through the work of making it capable of passing all of the CRAN checks (which would take work). But if you get the basic package structure you can use the `load_all` function from the devtools package to get essentially all of the functionality you're asking for without needing to explicitly install the package. – Dason Apr 03 '13 at 14:27
  • 1
    "What @Dason said" and it really is that the price of admission for NAMESPACEs is to create a package. – Dirk Eddelbuettel Apr 03 '13 at 14:29
  • 1
    @Dason Well the files are highly modular; the two functions might use quite a lot of other functionality from elsewhere. The package approach is very, very cumbersome for an agile development approach: every single change requires rebuilding and reinstalling (which in turn requires context switching away from your editor). Now, `load_all` does indeed sound nice. I still feel that the self-contained files that other module systems make available are superior (because they simply make less work). This simply *screams* of over-engineering. – Konrad Rudolph Apr 03 '13 at 14:31
  • @KonradRudolph devtools really does streamline the process and you don't need to rebuild and reinstall everytime you make a change to your functions if you use `load_all`. I think this might be what you're looking for. – Dason Apr 03 '13 at 15:48
  • 3
    Really, make packages. This massive overhead you repeatedly worry about is a ONE-OFF per package, you then keep all related R files in that package, and load_all is your import function. The advantage of keeping related R files in a single package IS worth the overhead, because at some point in your project you'll have code specific for that project, and reusable code that can work in other projects. How do you organise that? Simple, two packages. All the other advantages of devtools then become open to you (roxygen for example). – Spacedman Apr 04 '13 at 08:26
  • 5
    devtools helps a lot. It reduces the packaging effort from X to X/5, but X/5 in R is still significant. In sensible interpreted languages X equals zero! The definitive evidence that packages in R are overly cumbersome is that `source()` is /even used at all by anyone/. Compare Python. Who ever uses `execfile()` to import function definitions in Python? No one. You just write one file and `import` it; there are your namespaces. The "price of admission" is zero. Can you imagine writing an introductory R textbook that /never mentions `source()`/? – crowding Apr 04 '13 at 13:49
  • 2
    Note that the python price goes up to £0.01 when you start organising code in folders and have to create `__init__.py` files. But yes, R is not a sensible interpreted language. – Spacedman Apr 04 '13 at 16:16
  • 1
    Don't feed the trolls by trying to outtroll them. – Dirk Eddelbuettel Apr 04 '13 at 16:20
  • 5
    @KonradRudolph Agile development is fine with packages. Just `source` (or pipe from you editor) the edited functions into the R session. Then do `assignInNamespace(....)` to push the copy(ies) in your workspace into the package NAMESPACE. Once you have done being agile for the day you can rebuild and install the package with the new updates. – Gavin Simpson Apr 04 '13 at 16:43
13

A package is just a convention for where to store files (R files in R/, docs in man/, compiled code in src, data in data/): if you have more than a handful of files, you're best sticking with established convention. In other words, using a package is easier than not using a package, because you don't need to think: you can just take advantage of existing conventions and every R user will understand what's going on.

All a minimal package really needs is a DESCRIPTION file, which says what the package does, who can use it (the license), and who to contact if there are problems (the maintainer). This is a bit of an overhead, but it's not major. Once you've written that, you just fill in the additional directories as you need them - no need for the clumsy package.skeleton().

That said, the built-in tools for working with packages are cumbersome - you have to re-build/re-install the package, restart R and reload the package. That's where devtools::load_all() and Rstudio's build & reload come in - they use the same specification for a package, but provide easier ways to update a package from source. You can of course use the code snippets provided by the other answers, but why not use well tested code that's used by hundreds (well, tens at least) of R developers?

hadley
  • 102,019
  • 32
  • 183
  • 245
  • 1
    Another advantage to this method is simple inclusion of documentation and automatic `Namespace` generation, including `imports`, etc through the use of `ROxygen`. And then of course unit tests, etc, using `testthat`. – rmflight Apr 04 '13 at 13:21
  • Wholly endorse this. I use the R package structure to organize all my projects for this reason. I'd note that the R package provides a good way to organize data, documentation, and even pubs associated with the project if need be too. (e.g. https://github.com/cboettig/) devtools makes this workflow really simple. – cboettig Apr 04 '13 at 14:42
8

My comment to the OP's question wasn't quite right, but I think this re-write of the import function does the trick. foo.R and bar.R are files in the current working directory that contain a single function (baz) that prints the output shown below.

import <- function (module) {
  module <- as.character(substitute(module))
  # Search path handling omitted for simplicity.
  filename <- paste(module, 'R', sep = '.')
  # create imports environment if it doesn't exist
  if ("imports" %in% search())
    imports <- as.environment(match("imports",search()))
  # otherwise get the imports environment
  else
    imports <- attach(NULL, name="imports")
  if (module %in% ls("imports"))
    return()
  # create a new environment (imports as parent)
  env <- new.env(parent=imports)
  # source file into env
  sys.source(filename, env)
  # ...and assign env to imports as "module name"
  assign(module, env, imports)
}
setwd(".")
import(foo)
import(bar)
foo$baz()
# [1] "Hello World"
bar$baz()
# [1] "Buh Bye"

Note that baz() by itself won't be found, but the OP seemed to want the explicitness of :: anyway.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
6

I'm wholly sympathetic with @Dirk's answer. The small overhead involved in making a minimal package seems worth conforming to a "standard way".

However, one thing that came to mind is source's local argument, letting you source into an environment, which you could use like a namespace, e.g.

assign(module, new.env(parent=baseenv()), envir=topenv())
source(filename, local=get(module, topenv()), chdir = TRUE)

To access these imported environments with a simple syntax, give these import environments an new class (say, 'import'), and make :: generic, defaulting to getExportedValue when pkg doesn't exist.

import <- function (module) {
    module <- as.character(substitute(module))
    # Search path handling omitted for simplicity.
    filename <- paste(module, 'R', sep = '.')

    e <- new.env(parent=baseenv())
    class(e) <- 'import'
    assign(module, e, envir=topenv())
    source(filename, local=get(module, topenv()), chdir = TRUE)
}

'::.import' <- function(env, obj) get(as.character(substitute(obj)), env)
'::' <- function(pkg, name) {
    pkg <- as.character(substitute(pkg))
    name <- as.character(substitute(name))
    if (exists(pkg)) UseMethod('::')
    else getExportedValue(pkg, name)
}

Update

Below is a safer option that would prevent errors in the case that a loaded package contains an object with the same name as a package being accessed with ::.

'::' <- function(pkg, name) {
    pkg.chr <- as.character(substitute(pkg))
    name.chr <- as.character(substitute(name))
    if (exists(pkg.chr)) {
        if (class(pkg) == 'import')
            return(get(name.chr, pkg))
    }
    getExportedValue(pkg.chr, name.chr)
}

This would give the correct result, say, if you loaded data.table, and subsequently tried to access one of its objects with ::.

Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
  • I think we really disagree about what “small overhead” constitutes – as I’ve said in another comment the overhead is about *an order of magnitude* greater for small modules, and punished small, numerous packages in favour of big, monolithic ones. It simply supports a different development methodology. That said, I’ll try fooling around with `devtools` and see if this makes small packages less of a hassle. Finally, concerning your answer: your approach puts everything in an *environment* but not a *namespace*. – Konrad Rudolph Apr 03 '13 at 14:41
  • @KonradRudolph: From the [R Internals](http://cran.r-project.org/doc/manuals/R-ints.html) manual, [Section 1.2.2: Namespaces](http://cran.r-project.org/doc/manuals/R-ints.html#Namespaces), "Namespaces are environments associated with packages...". I.e., you can't have a namespace without a package. – Joshua Ulrich Apr 03 '13 at 14:46
  • @Joshua Okay but that’s begging the question. As mentioned in the comment further up, what I want is a way of explicitly qualifying a namespace/environment with a nice syntax. True, R couples that to packages but (as far as I can see) only by convention. You could write your own version of `::` which circumvents this. – Konrad Rudolph Apr 03 '13 at 14:48
  • @KonradRudolph so why not do that? Make `::` generic and give your namespace-environments a new class. It still might feel 'dirty' to you, but that should avoid breaking other packages. – Matthew Plourde Apr 03 '13 at 15:04
  • @MatthewPlourde Well that sounds like an answer … – Konrad Rudolph Apr 03 '13 at 15:05
  • What is wrong with `sys.source` here over `source`. The main feature of `sys.source` is to evaluate the `.R` file *in* the specified environment. – Gavin Simpson Apr 04 '13 at 16:50
  • @KonradRudolph If you have an environment, then you can do `env$foo()`, so isn't that the syntax you are looking for - no need to fiddle with `::`? – Gavin Simpson Apr 04 '13 at 16:51
  • @GavinSimpson thanks, you're absolutely right, `sys.source` would be fine. I'm in the habit of using `source`, since this function can do everything `sys.source` can and offers more options. I assumed OP had considered `$`... – Matthew Plourde Apr 04 '13 at 17:51
  • @GavinSimpson not that it's a big deal, but I can't see such a quibble warranting a downvote. – Matthew Plourde Apr 04 '13 at 18:29
  • @GavinSimpson one merit of overriding `::` is that one wouldn't have to update the code if an `import`ed module later becomes a standard package. – Matthew Plourde Apr 04 '13 at 20:43
  • @MatthewPlourde By the way; I didn't down vote and most certainly wouldn't have for `source` vs `sys.source`. – Gavin Simpson Apr 04 '13 at 22:54
6

I’ve implemented a comprehensive solution and published it as a package, ‘box’.

Internally, ‘box’ modules uses an approach similar to packages; that is, it loads the code inside a dedicated namespace environment and then exports selected symbols into a module environment which is returned to the user, and optionally attached. The main difference to packages is that modules are more lightweight and easier to write (each R file is its own module), and can be nested.

Usage of the package is described in detail on its website.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    I love this! With a big OOP background (C# mainly) and as an R noob trying to create a fairly large enterprise Shiny app - I simply cannot understand how I'm supposed to keep all my files in one single `R` folder - let alone somehow avoid naming conflicts in all of them! I feel I'm doing it wrong, but all these type of questions get knocked down. – Ctrl-Zed Jul 13 '22 at 23:26