2

I'm working on a project where there are some global assignments, and I ran into something sort of odd. I was hoping someone could help me with it.

I wrote this toy example to demonstrate the problem:

x <-  1:3 ; x <-  c(1, 2, 5) # this works fine
x <-  1:3 ; x[3] <- 5        # this works fine

x <<- 1:3 ; x <<- c(1, 2, 5) # this works fine
x <<- 1:3 ; x[3] <<- 5       # this does not work
# Error in x[3] <<- 5 : object 'x' not found

same.thing.but.in.a.function = function() {
  x <<- 1:3
  x[3] <<- 5
}
same.thing.but.in.a.function(); x
# works just fine

So, it seems it's not possible to change part of a vector using a global assignment -- unless that assignment is contained within a function. Can anyone explain why this is the case?

2 Answers2

4

I figured out the problem.

Basically, in this manifestation of <<- (which is more accurately called the "superassignment operator" rather than the "global assignment operator"), it actually skips checking the global environment when trying to access the variable.

On page 19 of R Language Definition, it states the following:

x <<- data.frame(0, 0, 0) # (I added this so the code can be run)
names(x)[3] <<- "Three"

is equivalent to

x <<- data.frame(0, 0, 0) # (I added this so the code can be run)
`*tmp*` <<- get(x, envir=parent.env(), inherits=TRUE)
names(`*tmp*`)[3] <- "Three"
x <<- `*tmp*`
rm(`*tmp*`)

When I tried to run those four lines, it threw an error -- parent.env requires an argument and has no default. I can only assume that the documentation was written at a time when parent.env() contained a default value for its first argument. But I can safely guess that the default would have been environment() which returns the current environment. It then throws an error again -- x needs to be in quotes. So I fixed that too. Now, when I run the first line, it throws the same error message as I encountered originally, but with more detail:

# Error in get("x", envir = parent.env(environment()), inherits = TRUE) :
#   object 'x' not found

This makes sense -- environment() itself returns .GlobalEnv, so parent.env(.GlobalEnv) misses out on the global environment entirely, instead returning the most recently loaded package environment. Then, since inherits is set to TRUE, the get() function keeps going up the levels, searching through each of the loaded package environments before eventually reaching the empty environment, and at that point it has still not found x. Thus the error.

Since parent.env(environment()) will return .GlobalEnv (or another environment below it) as long as you start inside a local environment, this same problem does not occur when the same lines are run from inside a local environment:*

local({
  x <<- data.frame(0, 0, 0) # (I added this so the code can be run)
  `tmp` <<- get("x", envir=parent.env(environment()), inherits=TRUE)
  names(`tmp`)[3] <- "Three"
  x <<- `tmp`
  rm(`tmp`)
})
x
#   X0 X0.1 Three
# 1  0    0     0

# so, it works properly

In contrast, when <<- is used in general, there is no extra subsetting code that occurs behind the scenes, and it first attempts to access the value in the current environment (which might be the global environment), before moving upwards. So in that situation, it doesn't run into the problem where it skips the global environment.

* I had to change the variable from *tmp* to tmp because one of the behind-the-scenes operations in the code uses the *tmp* variable and then removes it, so *tmp* disappears in the middle of line 3 and so it throws an error when I then try to access it.

0

If you change to single arrow assignment then it work

x <<- 1:3 ; x[3] <- 5    

BTW - I would suggest these wonderful discussions for better understanding and proper use of <<- operator -

Prem
  • 11,775
  • 1
  • 19
  • 33
  • but that won't work inside the function... (and, in practice, `?"<<-"` suggests that it should only be used inside a function anyway). I guess the interesting question is what `[<<-` is supposed to be (it doesn't appear to exist, and as far as I can tell isn't really discussed in the docs). Presumably it becomes a compound operation that calls both `[<-` and `<<-` to yield this different scoping rule. – baptiste Jul 08 '17 at 12:50
  • The thing is, I'm writing functions that do global assignments, but then I also want to be able to troubleshoot the functions by going inside and running each line. Because of that, it is alarming for me for this usage of the global assignment operator only to work when it's in a local environment -- that means it will throw an error _only_ when I'm trying to debug, which is not what I want at all! And as @baptiste said, the local assignment operator `<-` would work while I'm troubleshooting a function, but it wouldn't have the intended global effect when it's actually inside a function. – Roger Netherton Jul 08 '17 at 19:31
  • @Netherton Writing functions that do global assignment is bad practice. The only valid use for <<- that I've encountered is making use of closures, i.e. not assigning into the global environment. – Roland Jul 08 '17 at 22:38
  • @Roland I agree it's bad practice. I'm sort of at a loss of what else to do, though. I want to enclose my code into many sections that each run a function, so that I don't get all my temporary variables in the main workspace. But each section ends up with multiple variables that I want to keep in the global environment. The simplest way I thought of for doing this is just assigning those variables using `<<-`. Do you have any thoughts on a different method? – Roger Netherton Jul 09 '17 at 20:39
  • "multiple variables that I want to keep in the global environment" Why does it have to be the global environment and not a different environment (you can create those, you know) or a list (the usual choice)? It's always a bad idea to clutter the global environment with programatically created objects. And of course, you should try to follow the functional programming paradigms. Can't give more specific advice without a representative example. – Roland Jul 09 '17 at 21:55