20

OK, we're all familiar with double colon operator in R. Whenever I'm about to write some function, I use require(<pkgname>), but I was always thinking about using :: instead. Using require in custom functions is better practice than library, since require returns warning and FALSE, unlike library, which returns error if you provide a name of non-existent package.

On the other hand, :: operator gets the variable from the package, while require loads whole package (at least I hope so), so speed differences came first to my mind. :: must be faster than require.

And I did some analysis in order to check that - I've written two simple functions that load read.systat function from foreign package, with require and :: respectively, hence import Iris.syd dataset that ships with foreign package, replicated functions 1000 times each (which was shamelessly arbitrary), and... crunched some numbers.

Strangely (or not) I found significant differences in terms of user CPU and elapsed time, while there were no significant differences in terms of system CPU. And yet more strange conclusion: :: is actually slower! Documentation for :: is very blunt, and just by looking at sources it's obvious that :: should perform better!

require

#!/usr/local/bin/r

## with require
fn1 <- function() {
  require(foreign)
  read.systat("Iris.syd", to.data.frame=TRUE)
}

## times
n <- 1e3

sink("require.txt")
print(t(replicate(n, system.time(fn1()))))
sink()

double colon

#!/usr/local/bin/r

## with ::
fn2 <- function() {
  foreign::read.systat("Iris.syd", to.data.frame=TRUE)
}

## times
n <- 1e3


sink("double_colon.txt")
print(t(replicate(n, system.time(fn2()))))
sink()

Grab CSV data here. Some stats:

user CPU:     W = 475366    p-value = 0.04738  MRr =  975.866    MRc = 1025.134
system CPU:   W = 503312.5  p-value = 0.7305   MRr = 1003.8125   MRc =  997.1875
elapsed time: W = 403299.5  p-value < 2.2e-16  MRr =  903.7995   MRc = 1097.2005

MRr is mean rank for require, MRc ibid for ::. I must have done something wrong here. It just doesn't make any sense... Execution time for :: seems way faster!!! I may have screwed something up, you shouldn't discard that option...

OK... I've wasted my time in order to see that there is some difference, and I carried out completely useless analysis, so, back to the question:

"Why should one prefer require over :: when writing a function?"

=)

aL3xa
  • 35,415
  • 18
  • 79
  • 112
  • Is this for standalone functions or functions in a package? – hadley Dec 07 '10 at 02:06
  • Also, you would normally require() once at the top of your script, not once in every function call. – hadley Dec 07 '10 at 02:08
  • 1
    It's for standalone functions. I'm developing a web-application, and since RApache starts new R session upon each HTTP request, I'm trying to avoid unnecessary server load. This example is inappropriate - once you import a file, the job's done, but in an interactive webapp with bunch of AJAX calls, this may be quite inefficient. – aL3xa Dec 07 '10 at 03:29

2 Answers2

12

"Why should one prefer require over :: when writing a function?"

I usually prefer require due to the nice TRUE/FALSE return value that lets me deal with the possibility of the package not being available up front before getting into the code. Crash as early as possible instead of halfway through your analysis.

I only use :: when I need to make sure I am using the correct version of a function, not a version from some other package that is masking the name.

On the other hand, :: operator gets the variable from the package, while require loads whole package (at least I hope so), so speed differences came first to my mind. :: must be faster than require.

I think you may be ignoring the effects of lazy loading which is used by the foreign package according to the first page of its manual. Essentially, packages that use lazy loading defer the loading of objects, such as functions, until the objects are called upon for the first time. So your argument that ":: must be faster than require" is not necessarily true as foreign is not loading all of its contents into memory when you attach it with require. For full details on lazy loading, see Prof. Ripley's article in RNews, Volume 4, Issue 2.

Sharpie
  • 17,323
  • 4
  • 44
  • 47
  • You are so right... and this may be package-dependent problem. Oh, and thanks for the reference. – aL3xa Dec 07 '10 at 11:28
7

Since the time to load a package is almost always small compared to the time you spend trying to figure out what the code you wrote six months ago was about, in this case coding for clarity is the most important thing.

For scripts, having a call to require or library at the start lets you know which packages you need straight away.

Similarly, calling require (or a wrapper like requirePackage in Hmisc or try_require in ggplot2) at the start of a function is the most unambiguous way of showing that you need to use that package.

:: should be reserved for cases when you have naming conflicts between packages – compare, e.g.,

Hmisc::is.discrete

and

plyr::is.discrete
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360