The setnames
function in R (data.table
package) is unusual for R in that it changes input by reference, meaning that it does not copy the table on which it operates. (Not to be confused with setNames
from the stats
package).
This has a surprising (and inconvenient) consequence: it will change any object to which the names
attribute was previously saved. Viz:
require("data.table")
dt <- data.table(x = 1, y = 2)
dtnms <- function(dt){
nms <- names(dt)
print(nms)
setnames(dt, c("a", "b"))
nms
}
What would you dtnms(dt)
to return? "x" "y"
of course. Except it doesn't - setnames
also modifies nms
within the function, so that while "x" "y"
is printed before setnames
, "a" "b"
is returned. If you put a stop()
before the setnames
, you can see that nms
is just a character vector, with no special class or other indication of its provenance. So we would expect setnames
to have no idea where nms
had come from, yet somehow the link is there for setnames
to "see". How is this possible? (This works the same with simple data frames).
A few further observations (remember to reset dt
each time):
The link is propagated by further assignments
dtnms2 <- function(dt){ nms <- names(dt) print(nms) nms2 <- nms setnames(dt, c("a", "b")) nms2 }
dtnms2(dt)
gives"a" "b"
The link is not just a question of similarity
dtnms3 <- function(dt){ chv <- c("x", "y") setnames(dt, c("a", "b")) chv }
dtnms3(dt)
gives"x" "y"
The link is not detectable by
identical
dtnms4 <- function(dt){ chv <- c("x", "y") nms <- names(dt) identical(chv, nms) }
dtnms4(dt)
givesTRUE
The link can be broken (which is probably the best way round this)
dtnms5 <- function(dt){ nms <- names(dt) nms <- paste(nms) setnames(dt, c("a", "b")) nms }
dtnms5(dt)
gives"x" "y"
, back to the expected value.paste
has broken the link, whatever it was.
So my question is, what is the link? Why does setnames
change the nms
object, which is just a plain old character vector bearing no sign of where it came from?