6

I have some R code:

time.read = system.time(df <- data.frame(fread(f)))
print(class(time.read))
#[1] "proc_time"
print(class(df))
#[1] "data.frame"

Somehow when this is executed, in the main R environment/scope:

  • time.read has a value
  • df exists and contains the correct data.frame

I thought variables created inside a function were not available outside of the function's scope? How does this work? And why after running the following does y not exist in the main R environment?

fx <- function(z){return(1)}
out = fx(y <- 300)
print(out)
#[1] 1
print(y)
#Error in print(y) : object 'y' not found

Thanks!

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453

1 Answers1

19

Great question! R does something peculiar with its argument, which causes a lot of confusion but is also very useful.

When you pass an argument into a function in R, it doesn’t get evaluated until it’s actually used inside the function. Before that, the argument just sits around in a special container called a promise. Promises hold an expression and the environment in which they are supposed to be evaluated – for arguments, that’s the caller’s environment.

But as soon as you use the argument inside the function, its value is computed. This is how system.time works. Simplified:

system.time = function (expr) {
    before = proc.time()
    expr
    proc.time() - before
}

In other words, the function simply records the time before looking at its argument. Then it looks at its argument and thus causes its evaluation, and then it records the time elapsed. But remember that the evaluation of the argument happens in the caller’s scope, so in your case the target of the assignment (df) is also visible in the parent scope.

In your second example, your function fx never looks at its argument, so it never gets evaluated. You can easily change that, forcing the evaluation of its argument, simply by using it:

fx <- function(z) {
    z
    return(1)
}

In fact, R has a special function – force for this purpose:

fx <- function(z) {
    force(z)
    return(1)
}

But force is simply syntactic sugar, and its definition is simply to return its argument:

force = function (x) x

The fact that R doesn’t evaluate its arguments immediate is useful because you can also retrieve the unevaluated form inside the function. This is known as non-standard evaluation, and it’s sometimes used to evaluate the expression in a different scope (using the eval function with its argument envir specified), or to retrieve information about the unevaluated, expression.

Many functions use this, most prominently plot, which guesses default axis labels based on the plotted variables/expressions:

x = seq(0, 2 * pi, length.out = 100)
plot(x, sin(x))

Now the axis labels are x and sin(x). The plot function knows this because inside it, it can look at the unevaluated expressions of its function arguments:

xlabel = deparse(substitute(x))
ylabel = deparse(substitute(y))

substitute retrieves the unevaluated expression. deparse converts it into a string representation.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • 1
    Good answer, except that `system.time` doesn't actually use `eval` (at least not in my version of R). It evaluates the promise expression directly, without even using `force`, as in your second version of `fx`. – mrip Dec 03 '14 at 12:50
  • @mrip Yeah, I’ve half a mind to rewrite my answer, because my simplification actually makes it *more* complicated. In R, *every* function argument is a promise, and non-standard evaluation is simply a special case of this where the user didn’t evaluate the function argument (immediately), but instead `substitute`d it. – Konrad Rudolph Dec 03 '14 at 13:01
  • @mrip Scratch that, I’ve gone ahead and completely rewrote the answer. What do you think? It’s more correct now, but is it still understandable? – Konrad Rudolph Dec 03 '14 at 13:25
  • 1
    Looks great to me. Understandable, and accurate. Better than before IMO. – mrip Dec 03 '14 at 14:53