0

The setnames function in R (data.table package) is unusual for R in that it changes input by reference, meaning that it does not copy the table on which it operates. (Not to be confused with setNames from the stats package).

This has a surprising (and inconvenient) consequence: it will change any object to which the names attribute was previously saved. Viz:

require("data.table")

dt <- data.table(x = 1, y = 2)

dtnms <- function(dt){
    nms <- names(dt)
    print(nms)
    setnames(dt, c("a", "b"))
    nms
}

What would you dtnms(dt) to return? "x" "y" of course. Except it doesn't - setnames also modifies nms within the function, so that while "x" "y" is printed before setnames, "a" "b" is returned. If you put a stop() before the setnames, you can see that nms is just a character vector, with no special class or other indication of its provenance. So we would expect setnames to have no idea where nms had come from, yet somehow the link is there for setnames to "see". How is this possible? (This works the same with simple data frames).

A few further observations (remember to reset dt each time):

  1. The link is propagated by further assignments

    dtnms2 <- function(dt){
    nms <- names(dt)
        print(nms)
        nms2 <- nms
        setnames(dt, c("a", "b"))
        nms2
    }
    

    dtnms2(dt) gives "a" "b"

  2. The link is not just a question of similarity

    dtnms3 <- function(dt){
        chv <- c("x", "y")
        setnames(dt, c("a", "b"))
        chv
    }
    

    dtnms3(dt) gives "x" "y"

  3. The link is not detectable by identical

    dtnms4 <- function(dt){
        chv <- c("x", "y")
        nms <- names(dt)
        identical(chv, nms)
    }
    

    dtnms4(dt) gives TRUE

  4. The link can be broken (which is probably the best way round this)

    dtnms5 <- function(dt){
        nms <- names(dt)
        nms <- paste(nms)
        setnames(dt, c("a", "b"))
        nms
    }
    

    dtnms5(dt) gives "x" "y", back to the expected value. paste has broken the link, whatever it was.

So my question is, what is the link? Why does setnames change the nms object, which is just a plain old character vector bearing no sign of where it came from?

CJB
  • 1,759
  • 17
  • 26
  • 5
    Study `help("copy")`. Then research what "copy-on-modify" means. – Roland May 26 '16 at 11:49
  • 6
    Have a look at [this SO post](http://stackoverflow.com/q/15913417/559784), read the first paragraph in`?setnames` and also read the [Reference Semantics vignette](https://github.com/Rdatatable/data.table/wiki/Getting-started) – Arun May 26 '16 at 11:51
  • 1
    TL,DR: You're suspecting `setnames` to do something it does not. The root cause is the copy on write mechanism (broadly used in many languages), `setnames` modifies the memory in place, every pointer to this space get updated as when you read them you read the same memroy space. See Arun and Roland links for details. – Tensibai May 26 '16 at 13:13
  • 1
    @Tensibai I think that Arun's link is a better duplicate. Sorry - it is hard to check whether the question has been asked when you don't know the correct terms sometimes! – CJB May 26 '16 at 13:30
  • @Bazz no problem ;) I missed the link about SO post in arun comment. My bad. closing as dupe is not a problem at all, it just avoid replicating answers on multiple places. (Retracted my vote to allow a better dupe target) – Tensibai May 26 '16 at 13:37

0 Answers0