4

I've never quite gotten my head around nesting functions and passing arguments by reference. My strategy is typically to do something like get('variabletopassbyreference') inside the child function to accomplish this.

Until now, I have been passing global variables to the function and this worked fine. Today I tried to create local variables inside a function and then pass those to a nested function within that function and it failed. I'm unable to get get to work. I also tried tinkering the pos and inherits but to no avail.

I cannot find an exact answer on the net. If I could get this construct to work then that's my preference because I have a bunch of other functions that I've coded up in similar fashion. If I shouldn't be doing this at all and should be doing something else, then that information would be appreciated as well.

An example is below -

test1 <- function(a1,b1) {

  # cat(ls()) # a1 b1
  # cat(ls(pos = 1)) # c test1 test2

  testvalue <- get('c') * get(a1, inherits = TRUE) * get(b1)

  testvalue

}

test2 <- function() {

  a = 1
  b <- 2
  # cat(ls()) # a b
  test1('a','b')

}

c = 3
test2()

I get the following error -

Error in get(a1, inherits = TRUE) : object 'a' not found 

More generic example -

a = 0

test1 <- function(a1,b1) {

  # cat(ls()) # a1 b1
  # cat(ls(pos = 1)) # c test1 test2

  testvalue <- get('c') * a1 * b1

  assign(x = 'a', value = 2.5)
  assign(x = 'a', value = 3.5, envir = parent.frame())
  assign(x = 'a', value = 4.5, envir = .GlobalEnv)
  cat(a)
  cat(' - value of a local within test1\n')
  testvalue

} 

test2 <- function() {

  a = 1
  b <- 2
  # cat(ls()) # a b

  cat(a)
  cat(' - value of a local within test2 before test1 called\n')
  test1(a1 = a, b1 = b)
  cat(a)
  cat(' - value of a local within test2 after test1 called\n')

}
cat(a)
cat(' - value of a global before test 2 \n')
c = 3
test2()

cat(a)
cat(' - value of a global after test 2 \n')
TheComeOnMan
  • 12,535
  • 8
  • 39
  • 54
  • Why are you interested in passing by reference? – flodel May 04 '14 at 18:32
  • I assume passing by value creates a copy in memory. That is a luxury I can't afford when working with really large datasets. Rarely, I need to modify the object within the function itself too. – TheComeOnMan May 04 '14 at 19:07
  • 2
    That's not the case. Copies are made only when you attempt to modify the arguments. It is discussed in detail here: http://stackoverflow.com/questions/15759117/what-exactly-is-copy-on-modify-semantics-in-r-and-where-is-the-canonical-source – flodel May 04 '14 at 19:08
  • That's a lot of text, I'm going to read that but before I undertake that journey - so in the above example, if I also did something like `assign(a1, 3)` then it would create a copy but otherwise just keep referring to the original `a`? I could then use the `envir` argument in `assign` to modify the global variable itself. – TheComeOnMan May 04 '14 at 19:20
  • I'm afraid `assign` goes in the same basket as `get`. It's a terrible approach. Are you now trying to modify objects outside the function's scope? If that's the case, make your example more general so I can help provide an alternative. Maybe you will be interested in the `proto` package which offers a simple object oriented (pass-by-reference) design. – flodel May 04 '14 at 19:29
  • I would be fine with using `proto`, but if I can manage without using extra packages then I'd prefer that. Also, I added another example, using `parent.frame` that I just learn about from G. Grothendieck. I think the argument passing is okay now (except for the `cc` instead of `c` and your passing `cc` itself as an argument). You said assign is not right practice either, how do I deal with this here then? – TheComeOnMan May 04 '14 at 19:49
  • Are you using the data.table package? – Dason May 04 '14 at 20:57
  • @Dason - even with the `data.table` I remember having some case where it created a copy even when using typical data.table syntax which is why I switched to the `get` mentality. I can't remember why now. Do you know if there is any such case? – TheComeOnMan May 04 '14 at 22:22
  • @Dason, I just remembered one example. When using a keyed join you have to assign it to something (or return it). I had to use `assign` then. – TheComeOnMan May 04 '14 at 22:49

2 Answers2

7

Also, pass the environment that the variables are located in. Note that parent.frame() refers to the environment in the currently running instance of the caller.

test1 <- function(a1, b1, env = parent.frame()) {

  a <- get(a1, env)
  b <- get(b1, env)
  c <- get('c', env)

  testvalue <- c * a * b

  testvalue

}

c <- 3
test2() # test2 as in question
## 6

Here a and b are in env c is not in env but it is in an ancestor of env and get looks through ancenstors as well.

ADDED Note that R formulas can be used to pass variable names with environments:

test1a <- function(formula) {
    v <- all.vars(formula)
    values <- sapply(v, get, environment(formula))
    prod(values)
}

test2a <- function() {
    a <- 1
    b <- 2
    test1a(~ a + b + c)
}

c <- 3
test2a()
## 6

REVISION: Corrected. Added comment. Added info on formulas.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • The first part is what I'm looking for. I tried to `search` within the environment of `test2` and saw `.GlobalEnv`, when I do `parent.frame()`, I get ``. Aren't they the same? Why did my code not work? – TheComeOnMan May 04 '14 at 19:15
  • When `parent.frame()` is called in the revised `test1` it refers to the environment in its caller, namely the environment within the current running instance of `test2`, not the global environment. Have added a comment at the top of the answer. – G. Grothendieck May 04 '14 at 21:05
  • Thanks. I have accepted flodel's answer as it is more informative, therefore I could only give you a +1. – TheComeOnMan May 04 '14 at 22:28
3

Since you are asking, this definitely looks like a bad design to me. The recommended approach is to stick to R's way of pass-by-value. And as much as possible, make every function take everything it uses as arguments:

test1 <- function(a1, b1, c1 = 1) {
   testvalue <- c1 * a1 * b1   
   testvalue
}

test2 <- function(cc = 1) {
   a <- 1
   b <- 2
   test1(a1 = a, b1 = b, c1 = cc)
}

cc <- 3
test2(cc = cc)

(I replaced c with cc since it is the name of a function, hence a bad idea to use as variable name.)

A less acceptable but maybe closer approach to what you have is to not pass all arguments to your functions and let R look for them in the calling stack:

test1 <- function(a1, b1) {
   testvalue <- cc * a1 * b1   
   testvalue
}

test2 <- function() {
   a <- 1
   b <- 2
   test1(a, b)
}

cc <- 3
test2()

If for some reason the first approach does not work for you, please explain why so I get a chance to maybe convince you otherwise. It is the recommended way of programming in R.


Following on the discussion and your edit, I'll recommend you look at the proto package as an alternative to get and assign. Essentially, proto objects are environments so it's nothing you can't do with base R but it helps make things a bit cleaner:

test1 <- function(x) {
   testvalue <- x$c * x$a * x$b
   x$a <- 3.5
   testvalue
}

test2 <- function(x) {
   x$a <- 1
   x$b <- 2
   cat(x$a, '\n')
   test1(x)
   cat(x$a, '\n')
}

library(proto)
x <- proto(c = 3)
test2(x)

From a programming point of view, test1 and test2 are functions with side-effects (they modify the object x). Beware that its a risky practice.

Or maybe a better approach is to make test1 and test2 be methods of a class, then it is acceptable if they modify the instance they are running on:

x <- proto() # defines a class

x$test1 <- function(.) {
   testvalue <- .$c * .$a * .$b
   .$a <- 3.5
   testvalue
}

x$test2 <- function(.) {
   .$a <- 1
   .$b <- 2
   cat(.$a, '\n')
   .$test1()
   cat(.$a, '\n')
}

library(proto)
y <- x$proto(c = 3)  # an instance of the class
y$test2()

If you are not interested in using a third-party package (proto), then look at R's support for building classes (setClass, setRefClass). I do believe using an object-oriented design is the right approach given your specs.

flodel
  • 87,577
  • 21
  • 185
  • 223
  • 1
    What about reference classes? Aren't these roughly equivalent to the proto package, and part of standard R? – Paul Hiemstra May 04 '14 at 20:13
  • +1. this looks interesting. I don't get the `.` thing but I suppose that is some syntactical detail. Thanks! However, I still don't understand why my original code didn't work. If you could tell me that then the check mark is yours. – TheComeOnMan May 04 '14 at 20:41
  • `.` is like the `self` variable in other OO programming languages. It refers to the instance. I think you get an error because `test1` is defined in the global environment so `get` will look for `a` inside `test1` first, then inside the global environment, never inside `test2` unless you tell it so. See the difference if `test1` was defined inside `test2` for example, or after adding `envir = parent.frame()` to your `get` calls. – flodel May 04 '14 at 20:55
  • I thought that was the order in which it looked - current environment, parent of that (where current was called from), parent of parent of that (where parent of current was called from), so on so forth. That seems more intuitive to me than what you described ( which I checked and is correct ). – TheComeOnMan May 04 '14 at 22:13