7

Inspired by Q6437164: can someone explain to me why the following works:

iriscopy<-iris #or whatever other data.frame
iriscopy$someNonExistantColumn[1]<-15

To me, it seems not obvious how R interprets this statement as: create a new column with name someNonExistantColumn in the data.frame, and set the first value (in fact, all values, as it seems) to the value 15.

Community
  • 1
  • 1
Nick Sabbe
  • 11,684
  • 1
  • 43
  • 57

2 Answers2

10

The R language definition manual gives us a pointer to how R evaluates expressions of the form:

x$foo[1] <- 15

namely it is as if we have called

`*tmp*` <- x
x <- "$<-.data.frame"(`*tmp*`, name = "foo", 
                      value = "[<-.data.frame"("$.data.frame"(`*tmp*`, "foo"), 
                                               1, value = 15))
rm(`*tmp*`)

the middle bit might be easier to grapple with if we drop, for purposes of exposition, the actual methods used:

x <- "$<-"(`*tmp*`, name = "foo", 
           value = "[<-"("$"(`*tmp*`, "foo"), 1, value = 15))

To go back to your example using iris, we have something like

iris$foo[1] <- 15

Here, the functions are evaluated recursively. First the extractor function "$" is used to access component "foo" from iris, which is NULL:

> "$"(iris, "foo")
NULL

Then, "[<-" is used to replace the first element of the object returned above (the NULL) with the value 15, i.e. a call of:

> "[<-"(NULL, 1, value = 15)
[1] 15

Now, this is the object that is used as argument value in the outermost part of our call, namely the assignment using "$<-":

> head("$<-"(iris, "foo", value = 15))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species foo
1          5.1         3.5          1.4         0.2  setosa  15
2          4.9         3.0          1.4         0.2  setosa  15
3          4.7         3.2          1.3         0.2  setosa  15
4          4.6         3.1          1.5         0.2  setosa  15
5          5.0         3.6          1.4         0.2  setosa  15
6          5.4         3.9          1.7         0.4  setosa  15

(here wrapped in head() to limit the number of rows shown.)

That hopefully explains how the function calls progress. The last issue to deal with is why the entire vector foo is set to 15? The answer to that is given in the Details section of ?"$<-.data.frame":

Details:

....

         Note that there is no ‘data.frame’ method for ‘$’, so ‘x$name’
     uses the default method which treats ‘x’ as a list.  There is a
     replacement method which checks ‘value’ for the correct number of
     rows, and replicates it if necessary.

The key bit is the last sentence. In the above example, the outermost assignment used value = 15. But at this point, we are wanting to replace the entire component "foo", which is of length nrow(iris). Hence, what is actually used is value = rep(15, nrow(iris)), in the outermost assignment/function call.

This example is all the more complex because you have to convert from the convenience notation of

x$foo[1] <- 15

into proper function calls using "$<-"(), "[<-"(), and "$"(). The example in Section 3.4.4 of The R Language Definition uses this simpler example:

names(x)[3] <- "Three"

which evaluates to

`*tmp*` <- x
x <- "names<-"(`*tmp*`, value="[<-"(names(`*tmp*`), 3, value="Three"))
rm(`*tmp*`)

which is slightly easier to get your head around because names() looks like a usual function call.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • That is just wicked beautiful. I'm glad I asked and I'm glad you answered. You just saved me from mental illness. – Nick Sabbe Jun 22 '11 at 13:46
3

I think the answer is that it doesn't work.

I consider the $newcol to be standard behaviour to create a new column. For example:

iris$newcol <- 1

will create a new column in the iris data.frame. All values will be 1, because of vector recycling.

This creation of a new column gets triggered when the expression evaluates to NULL. From ?$<-:

  • "When $<- is applied to a NULL x, it first coerces x to list(). This is what also happens with [[<- if the replacement value value is of length greater than one: if value has length 1 or 0, x is first coerced to a zero-length vector of the type of value."

So I think what happens here is that the expression evaluates to NULL, and this triggers the code to create a new column, which in turn uses vector recycling to fill the values.

Edit

The parsing probably works using $-assign $<- rather than bracket-assign [<-. Compare:

head(`$<-`(iris, newcol, 1))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species newcol
1          5.1         3.5          1.4         0.2  setosa      1
2          4.9         3.0          1.4         0.2  setosa      1
3          4.7         3.2          1.3         0.2  setosa      1
4          4.6         3.1          1.5         0.2  setosa      1
5          5.0         3.6          1.4         0.2  setosa      1
6          5.4         3.9          1.7         0.4  setosa      1

But bracket assign produces an error:

head(`[<-`(iris, newcol, 1))
Error in head(`[<-`(iris, newcol, 1)) : 
  error in evaluating the argument 'x' in selecting a method for function 'head': Error in is.atomic(value) : 'value' is missing
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • Actually, I don't really get how this (my original code) is parsed in R. Is it `bracketassign(dollar(iriscopy, someNonExistantColumn), 1, 15)`? Or is it `dollarassign(iriscopy, bracket(someNonExistantColumn, 1), 15)`? (excuse the silly naming here). Or am I falsely assuming that everything is eventually parsed as a function call? – Nick Sabbe Jun 22 '11 at 11:49
  • @NickSabbe I am going well beyond my comfort zone here, but I tried to answer your question in my edit. – Andrie Jun 22 '11 at 12:18
  • I don't think you are calling `[<-` right there. Look at `args('[<-.data.frame')` and you'll see you need to specify `i`, `j` and `value`, and `i` or `j` can be blank, just as you would leave them blank in extraction. So `head('[<-'(iris, , 6, 2))` works, but you can't name the column as it doesn't exist. – Gavin Simpson Jun 22 '11 at 12:50
  • @Gavin: can you provide some input on my original question? I'm driving myself crazy here in trying to interpret how `iriscopy$someNonExistantColumn[1]<-15` is parsed by R. @Andrie had some ideas on the what, but not really (yet) on the why. – Nick Sabbe Jun 22 '11 at 13:18
  • @Nick Sabbe see my answer just submitted. This makes my brain hurt, but I think I have got to the bottom of it. It is hard so took a while getting the explanation right. – Gavin Simpson Jun 22 '11 at 13:33