7

I am writing some functions for doing repeated tasks, but I am trying to minimize the amount of times I load the data. Basically I have one function that takes some information and makes a plot. Then I have a second function that will loop through and output multiple plots to a .pdf. In both functions I have the following line of code:

if(load.dat) load("myworkspace.RData")

where load.dat is a logical and the data I need is stored in myworkspace.RData. When I am calling the wrapper function that loops through and outputs multiple plots I do not want to reload the workspace in every call to the inner function. I thought I could just load the workspace once in the wrapper function, then the inner function could access that data, but I got an error stating otherwise.

So my understanding was when a function cannot find the variable in its local environment (created when the function gets called), the function will look to the parent environment for the variable.

I assumed the parent environment to the inner function call would be the outer function call. Obviously this is not true:

func1 <- function(...){
  print(var1)
}

func2 <- function(...){
  var1 <- "hello"
  func1(...)
}

> func2()
Error in print(var1) : object 'var1' not found

After reading numerous questions, the language manual, and this really helpful blog post, I came up with the following:

var1 <- "hello"
save(list="var1",file="test.RData")
rm(var1)

func3 <- function(...){
  attach("test.RData")
  func1(...)
  detach("file:test.RData")
}

> func3()
[1] "hello"

Is there a better way to do this? Why doesn't func1 look for undefined variables in the local environment created by func2, when it was func2 that called func1?

Note: I did not know how to name this question. If anyone has better suggestions I will change it and edit this line out.

dayne
  • 7,504
  • 6
  • 38
  • 56
  • 2
    Lexical scoping means the function will look for undefined symbols in its parent environment, which is not necessarily the calling environment. Check this also: https://github.com/hadley/devtools/wiki/Environments – Ferdinand.kraft Aug 20 '13 at 15:23
  • @Ferdinand.kraft Thanks for the link. I will work through that this afternoon. – dayne Aug 20 '13 at 15:27
  • If your data is in form of dataframes, you could use package `data.table`, and pass your tables as an argument to `func1` inside `func3`. This package works by reference and does not make unwanted copies of your data. – Ferdinand.kraft Aug 20 '13 at 15:31
  • Not quite sure why it isn't seeing `var1`, but note that `print(parent.frame()$var1)` works fine. – Richie Cotton Aug 20 '13 at 15:47
  • @RichieCotton, `func1` looks for `var1` is its enclosing environment, i.e., where it belongs, which happens to be `R_GlobalEnv`. The call to `parent.frame()` inside `func1` inside `func3` returns the evaluation environment of `func3`, where `var1` belongs. (boy what a mess :-) – Ferdinand.kraft Aug 20 '13 at 17:56
  • @dayne, I think you should be more specific in your question. How does your data look like? – Ferdinand.kraft Aug 20 '13 at 19:37
  • @Ferdinand.kraft My data is two different data frames. I knew this question was kind of vague, but I am trying to better understand the environment definitions in r. Really I just do not understand why `func1` cannot see the environments created by `func2` or `func3`. Is there any way to get around this? Or to make functions look to other function environments? – dayne Aug 20 '13 at 19:43
  • 1
    @dayne, it is intentional that `func1` cannot see those environments. When you type `func1 <- function...` in the console, you are creating an object of type closure which has an environment property, equal to `R_GlobalEnv`. This is where R will look for symbols not resolved in the evaluation of `func1`'s body. The *evaluation* environment created during the execution of `func2` or `func3` is *irrelevant* WRT symbol lookup. A workaround is to use `parent.frame()$var1`, as Richie pointed above, but it is very ugly. – Ferdinand.kraft Aug 20 '13 at 19:52
  • @Ferdinand.kraft If you want to post that as an answer I will accept it. Thanks for your continued attention today! – dayne Aug 20 '13 at 20:27

2 Answers2

9

To illustrate lexical scoping, consider the following:

First let's create a sandbox environment, only to avoid the oh-so-common R_GlobalEnv:

sandbox <-new.env()

Now we put two functions inside it: f, which looks for a variable named x; and g, which defines a local x and calls f:

sandbox$f <- function()
{
    value <- if(exists("x")) x else "not found."
    cat("This is function f looking for symbol x:", value, "\n")
}

sandbox$g <- function()
{
    x <- 123
    cat("This is function g. ")
    f()
}

Technicality: entering function definitions in the console causes then to have the enclosing environment set to R_GlobalEnv, so we manually force the enclosures of f and g to match the environment where they "belong":

environment(sandbox$f) <- sandbox
environment(sandbox$g) <- sandbox

Calling g. The local variable x=123 is not found by f:

> sandbox$g()
This is function g. This is function f looking for symbol x: not found. 

Now we create a x in the global environment and call g. The function f will look for x first in sandbox, and then in the parent of sandbox, which happens to be R_GlobalEnv:

> x <- 456
> sandbox$g()
This is function g. This is function f looking for symbol x: 456 

Just to check that f looks for x first in its enclosure, we can put a x there and call g:

> sandbox$x <- 789
> sandbox$g()
This is function g. This is function f looking for symbol x: 789 

Conclusion: symbol lookup in R follows the chain of enclosing environments, not the evaluation frames created during execution of nested function calls.

EDIT: Just adding a link to this very interesting answer from Martin Morgan on the related subject of parent.frame() vs parent.env()

Community
  • 1
  • 1
Ferdinand.kraft
  • 12,579
  • 10
  • 47
  • 69
  • This is the best illustration I have seen. Thank you so much! I was not really understanding the difference in environments and frames. – dayne Aug 21 '13 at 12:54
2

You could use closures:

f2 <- function(...){
   f1 <- function(...){
     print(var1)
   }
   var1 <- "hello"
   f1(...)
 }
 f2()
Karl Forner
  • 4,175
  • 25
  • 32
  • Right, but I need to be able to use the inner function as a stand-alone function. I did not want to have to redefine the inner function every time I call the outer function (not to mention duplicate a bunch of code). – dayne Aug 20 '13 at 16:24
  • Then the cleanest setting in my opinion: put all your data in a list (my_data), then give it as argument to your function. Inside the function you may use with(my_data, { } ) to avoid extra typing. – Karl Forner Aug 21 '13 at 08:04