45

By accident i've encountered strange behaviour of "[<-" operator. It behaves differently depending on order of calls and whether i'm using RStudio or just ordinary RGui. I will make myself clear with an example.

x <- 1:10
"[<-"(x, 1, 111)
x[5] <- 123

As far as i know, first assigment shouldn't change x (or maybe i'm wrong?), while the second should do. And in fact the result of above operations is

x
[1]  1  2  3  4  123  6  7  8  9 10

However, when we perform these operations in different order, results are different and x has changed! Meaningly:

x <- 1:10
x[5] <- 123
"[<-"(x, 1, 111)
x
[1] 111   2   3   4   123   6   7   8   9  10

But it only happens when i'm using plain R! In RStudio the behaviour is the same in both options. I've checked it on two machines (one with Fedora one with Win7) and the situation looks exactly the same. I know the 'functional' version ("[<-"(x..)) is probably never used but i'm very curious why it is happening. Could anyone explain that?

==========================

EDIT: Ok, so from comments i get that the reason was that x <- 1:10 has type 'integer' and after replacing x[5] <- 123 it's 'double'. But still remains question why behaviour is different in RStudio? I restart R session and it doesn't change anything.

BartekCh
  • 920
  • 6
  • 15
  • 3
    It might be worth reading http://stackoverflow.com/questions/15178507/function-will-replace-an-element-but-not-append-an-element/15179065#15179065 -- this is not the same issue however – mnel Mar 21 '13 at 22:30
  • It may also be related to https://stat.ethz.ch/pipermail/r-devel/2013-March/066080.html – mnel Mar 21 '13 at 22:34
  • On further reading I think it is related to the `typeof(111)` compared to `typeof(1:10)`. Why it differs between Rstudio and RGui though, I'm not sure. – mnel Mar 21 '13 at 22:41
  • Every combination of (4 combinations of numeric(x) and integer(changing value)) changes values *except* (integer, numeric). That is, `x <- 1:10`; `"[<-"(x, 1, 111)` is the only combination that doesn't replace. – Arun Mar 21 '13 at 22:50
  • 1
    WHat do you mean by `plain old R`? – mnel Mar 21 '13 at 22:53
  • I mean RGui under Windows7 and console version under Fedora. – BartekCh Mar 21 '13 at 23:01
  • 3
    This is a wild guess, and I'm not sure how to test it, but in the question mnel linked to, the point was made that if there is a second reference to the object that the replacement will not be done in place but will result in a copy (and thus not modify the original variable). Perhaps RStudio, as part of its GUI, has references to the object. That is possible since it has an object browser. Or some other aspect which is triggering the copying mechanism rather than the replace-in-place behavior. – Brian Diggs Mar 21 '13 at 23:12
  • 1
    I can't reproduce this, same behavior (order matters) in both, RStudio and RGUI (and Eclipse, which uses Rterm). – Roman Luštrik Mar 21 '13 at 23:12
  • 1
    @Arun -- I think that initial comment was a red herring. This isn't a problem with R, so I would be **shocked** (and not in a good way) if it has changed since the Feb 20th version of R-devel that I'm running... – Josh O'Brien Mar 21 '13 at 23:25
  • This is really interesting: `x <- 1:10`; `x`; `x[5] <- 123`; `x`; `"[<-"(x, 1, 111)`. Now, paste these lines 1) one by one and hit enter each time before typing the other.. and 2) paste all at once and hit enter in **Rstudio** and see the difference. – Arun Mar 21 '13 at 23:58
  • @BrianDiggs I think you're on to something. The way to test it would be to replace `/bin/R` with a link the equivalent shell script in the debug build tree, then fire up R studio. I'll try this if I get a chance. – Matthew Lundberg Mar 22 '13 at 01:25
  • All the same, I vote for "don't do such a screwy thing" as the proper answer. If you dig deep enough, you can find nasty little surprises in almost any software (as evidenced by the number of "...craftily constructed message leads to security hole..." bug reports on the web). – Carl Witthoft Mar 22 '13 at 01:32
  • @CarlWitthoft I've used "[" as a function in the past, for example as a parameter to an apply function. Although I don't immediately see why I might want to pass "[<-" to an apply function, it doesn't seem out of the realm of possibility. – Matthew Lundberg Mar 22 '13 at 01:40
  • @Arun That may have to do with when the object browser takes a reference. – Matthew Lundberg Mar 22 '13 at 01:41
  • 2
    @Arun -- As Matt's prob. saying, when you paste in all the commands at once, Rstudio's object browser doesn't have a chance to 'touch' `x` in a way that resets its `named` field to `2` until after the subassignment has taken place. You (or someone else with Rstudio) could test this by pasting in `x <- 1:10; .Internal(inspect(x))` either all at once, or one by one. In the first case, I'd expect to see `[MARK,NAM(1)]` and in the second `[MARK,NAM(2)]`. If so, I think the mystery's basically solved. – Josh O'Brien Mar 22 '13 at 02:37
  • @JoshO'Brien You are correct. `1` if pasted in one go (or in one line, separated by a `;`). `2` if entered on two lines. – Matthew Lundberg Mar 22 '13 at 03:03
  • @MatthewLundberg -- Thanks! If you're in the mood, feel free to edit my answer with that info (perhaps replacing the code I copied from Arun). I'd do it now, but am off to put the little one down for the night ;) – Josh O'Brien Mar 22 '13 at 03:19
  • @MatthewLundberg Thanks again. I ended up condensing your contribution, just to make the whole answer read smoother, but it was still a big help. Cheers. – Josh O'Brien Mar 22 '13 at 06:32
  • MatthewLundberg, JoshO'Brien, thank you very much for the explanation. – Arun Mar 22 '13 at 07:47
  • @MatthewLundberg Fair enough. I guess I should have suggested that non-experts like me :-) avoid getting too tricky. It's certainly very interesting to find out that Rstudio does something slightly different from other GUI interfaces (or maybe I need to test this out with the Mac Rgui.app) – Carl Witthoft Mar 22 '13 at 11:18

1 Answers1

39

Rstudio's behavior

Rstudio's object browser modifies objects it examines in a way that forces copying upon modification. Specifically, the object browser employs at least one R function whose call internally forces evaluation of the object, in the process resetting the value of the object's named field from 1 to 2. From the R-Internals manual:

When an object is about to be altered, the named field is consulted. A value of 2 means that the object must be duplicated before being changed. [...] A value of 1 is used for situations [...] where in principle two copies of a exist for the duration of the computation [...] but for no longer, and so some primitive functions can be optimized to avoid a copy in this case.

To see that the object browser modifies the named field ([NAM()] in the next code block), compare the results of running the following lines. In the first, both 'lines' are run together, so that Rstudio has no time to 'touch' X before its structure is queried. In the second, each line is pasted in separately, so X is modified before it is examined.

## Pasted in together
x <- 1:10; .Internal(inspect(x))
# @46b47b8 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 1,2,3,4,5,...

## Pasted in with some delay between lines
x <- 1:10
.Internal(inspect(x))
# @42111b8 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,2,3,4,5,... 

Once the named field is set to 2, [<-(X, ...) will not modify the original object. Pasting the following into Rstudio all at once modifies X, while pasting it in line-by-line does not:

x <- 1:10
"[<-"(x, 1, 111)

One more consequence of all this is that Rstudio's object browser actually makes some operations slower than they would otherwise be. Again, compare the same two commands first pasted in together, and then one at a time:

## Pasted in together
x <- 1:5e7
system.time(x[1] <- 9L)
#    user  system elapsed 
#       0       0       0 

## Pasted in one at a time
x <- 1:5e7
system.time(x[1] <- 9L)
#    user  system elapsed 
#    0.11    0.04    0.16 

Variable behavior of [<- in R

The behavior of [<- w.r.t. modifying a vector X depends on the storage types of X and of the element being assigned into it. This explains R's behavior but not Rstudio's.

In R, when [<- either appends to a vector X, or performs a subassignment that requires that X's type be modified, X is copied and the value that is returned does not overwrite the pre-existing variable X. (To do that you need to do something like X <- "[<-(X, 2, 100).

So, neither of the following modify X

X <- 1:2         ## Note: typeof(X) --> "integer"

## Subassignment that requires that X be coerced to "numeric" type
"[<-"(X, 2, 100) ## Note: typeof(100) --> "numeric"
X 
# [1]   1   2

## Appending to X
"[<-"(X, 3, 100L)
X
# [1]   1   2

Whenever possible, though, R does allow the [<- function to modify X directly by reference (i.e. without making a copy). "Possible" here includes cases in which a sub-assignment doesn't require that X's type be modified.

So all of the following modify X

X <- c(0i, 0i, 0i, 0i)
"[<-"(X, 1, TRUE)
"[<-"(X, 2, 20L)
"[<-"(X, 3, 3.14)
"[<-"(X, 4, 5+5i)
X
# [1]  1.00+0i 20.00+0i  3.14+0i  5.00+5i
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • Anybody have a nice reference to R's hierarchy of types, in which `logical < integer < numeric < complex < character`, and types to the left are automatically converted to types to the right? – Josh O'Brien Mar 21 '13 at 23:20
  • 2
    I've reproduced your last example and the result was the same, but when before replacing `"[<-"(X, 2, 20L)` i run command `typeof(X)`, then replacing doesn't work (in RGui). Strange. – BartekCh Mar 21 '13 at 23:30
  • Only http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Basic-types, which lists them in that order (without specifically mentioning a hierarchy) – mnel Mar 21 '13 at 23:35
  • @BartekCh -- OK, that is truly weird. Nice catch! That **does** seem like a bug in R. Now I'm glad I left this up rather than deleting after I saw that it had really become a question about Rstudio. – Josh O'Brien Mar 21 '13 at 23:35
  • @JoshO'Brien, maybe it's better to edit the post to note the **bug** as well? – Arun Mar 21 '13 at 23:40
  • 1
    @JoshO'Brien: for reference to the hierarchy use: `?Comparison` – IRTFM Mar 22 '13 at 00:28
  • @Arun -- It probably deserves its own question, either here or on the R-devel mailing list. Before posting on the latter, though, I'd do some more research, starting with the difference you see when doing the following: `X <- c(0i, 0i); .Internal(inspect(X)); typeof(X); .Internal(inspect(X))`. Not sure what that `[MARK,NAM(2)]` bit means, but obviously it's changed by doing `typeof(X)` – Josh O'Brien Mar 22 '13 at 01:04
  • @Arun -- Based on page 2 of the R-Internals manual, it appears to be part of the "sxpinfo header", with name "named", which is "used to control copying", which makes sense given its effect here. Not sure why it's changed by `typeof()`, though, and I wonder what other operations modify it. I suspect Matthew Dowle might have some insight... – Josh O'Brien Mar 22 '13 at 01:12
  • @BartekCh -- I don't know anything about Rstudio, but it does start to look like these two "bugs" might be related. If Rstudio has some sort of object browser that "touches" objects in the same way that `typeof()` does, then that would explain (at one level) what's going on. – Josh O'Brien Mar 22 '13 at 01:15
  • If there's a bug, it's that R does not always make a copy when "[<-" is called by name. See Hadley's comment to my answer in the other question. – Matthew Lundberg Mar 22 '13 at 01:36
  • @MatthewLundberg -- I saw that but didn't take it to be definitive (though I do agree that it's the more surprising behavior). I find the fact that doing `typeof()` permanently alters an object to be just as surprising, if not itself a "bug" (whatever that means). – Josh O'Brien Mar 22 '13 at 02:12
  • 1
    The named field is used for reference counting, and says how many variables are pointing to that object in memory. It can either be 1 or 2, which is short for more than 1. It's a performance optimisation that allows simple allocations and deletes to avoid needing the garbage collector. I suspect it's set to 2 in RStudio because there are two references to the object: one in the console and one in the object browser. – hadley Mar 22 '13 at 13:17
  • Thank you all for explanations. It's good to know how things work inside:) – BartekCh Mar 22 '13 at 17:47
  • @hadley -- It might be that simple, but the examples with `typeof()` indicate that it could be something else. Running `typeof(X)` doesn't make a new permanent reference to `X` (does it??), but it does set the **named** field to 2 (with all the attendant changes in subassignment behavior). Do you think that might qualify as a bug with `typeof()`, warranting a report to R-core? – Josh O'Brien Mar 22 '13 at 17:48
  • @JoshO'Brien There's a lot going on that example: complex seems to behave differently to other automatic cases, and replacing `typeof` with `f <- function(x) {x; invisible()}` gives similar results. I'd suggest a separate question. – hadley Mar 22 '13 at 19:33
  • @JoshO'Brien also the count is 1 or many, and many - 1 is still many. So a count of 2 doesn't imply that there's another permanent reference still around. – hadley Mar 22 '13 at 19:35
  • @hadley Just posted a question to R-devel, to see what light they might shed re: `typeof()`'s behavior. – Josh O'Brien Mar 22 '13 at 19:44
  • @JoshO'Brien since the behaviour is not specific to `typeof`, I don't think you're going to get a particularly friendly response. – hadley Mar 22 '13 at 19:49
  • @hadley -- At the very least, there doesn't seem to be a good affirmative reason for `typeof` to do what it does. My hesitation in posting mainly related to the suspicion that this might be too minor an issue for anyone in R-core to care about, but we'll see. I'm curious so I asked! – Josh O'Brien Mar 22 '13 at 19:56
  • @JoshO'Brien there doesn't have to be a good affirmative reason. I think you're missing the point of the named field. – hadley Mar 22 '13 at 20:00
  • @hadley -- I'm under no illusion that there needs to be another actual copy lying around for named field's value to be 2. I suspect (as I think you do) that `typeof`, in the course of its work, creates a reference to the object, and that the named field then gets set to 2. I just don't think it's desirable for `typeof` to leave the named field in that state, since it doesn't actually ever leave a copy that could create a problem. – Josh O'Brien Mar 22 '13 at 20:56
  • @hadley (continued) Obviously, at the C-level, functions can be written so that they leave the named field in whatever state they want. I suspect the real answer about why `typeof` doesn't have extra code to 'clean up' after itself is that nobody cares. I wish I knew C well enough to see why, e.g. `class()` or `length()` don't have the same effect on the named field as `typeof()` does. – Josh O'Brien Mar 22 '13 at 20:56
  • 3
    I gave a slightly fuller answer on R-Devel, but you don't need to know C to get the gist of the difference between `class()` and `typeof()` -- just type them both at the R prompt and note the difference in what's printed. – mweylandt Mar 23 '13 at 09:51
  • 1
    @JoshO'Brien Do you know of any relevant docs/issues on RStudio's pages about this 'feature'? I had planned to work through the answers to [Confused about NAMED](http://r.789695.n4.nabble.com/Confused-about-NAMED-td4103326.html), but fell already at the second line of the _question_: `y = 1:10` "`NAM(1)` as expected". Nope, not in Rstudio... Thanks for your excellent answer which saved my day. – Henrik Dec 09 '16 at 09:35
  • The closest I come to an official doc is a note in Hadley's book, in the section on [Modification in place](http://adv-r.had.co.nz/memory.html#modification): "Note that if you’re using RStudio, `refs()` will always return 2: the environment browser makes a reference to every object you create on the command line" (`?pryr::refs` "returns the number of references pointing to the underlying object") – Henrik Dec 10 '16 at 12:37
  • @Henrik. Hey, nice find. Thanks for adding that link. – Josh O'Brien Dec 11 '16 at 22:16