3

Maybe some can tell me why the names I assigned to "idVars" are changing after adding a column to my data.table (without reassigning them)? How can I persist the assignment to store only the first two column names?

Thanks!

library(data.table)

DT <- data.table(a=1:10, b=1:10)
idVars <- names(DT)
print(idVars)
# [1] "a" "b"

DT[, "c" := 1:10]
print(idVars)
# [1] "a" "b" "c"


# devtools::session_info()                
# data.table * 1.11.6  2018-09-19 CRAN (R 3.5.1)
ismirsehregal
  • 30,045
  • 5
  • 31
  • 78

1 Answers1

7

We can create a copy of the names as the names(DT) and the 'idVars' have the same memory location

tracemem(names(DT))
#[1] "<0x7f9d74c99600>"
tracemem(idVars)
#[1] "<0x7f9d74c99600>"

So, instead create a copy of the names

idVars <- copy(names(DT))
tracemem(idVars)
#[1] "<0x7f9d7d2b97c8>"

and it wouldn't change after the assignment

DT[, "c" := 1:10]
idVars
#[1] "a" "b"

According to ?copy:

A copy() may be required when doing dt_names = names(DT). Due to R's copy-on-modify, dt_names still points to the same location in memory as names(DT). Therefore modifying DT by reference now, say by adding a new column, dt_names will also get updated. To avoid this, one has to explicitly copy: dt_names <- copy(names(DT)).

Henrik
  • 65,555
  • 14
  • 143
  • 159
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Great thanks! Was this behaviour changed lately? I'm wondering why I haven't stumbled over this earlier. – ismirsehregal Oct 18 '18 at 16:56
  • 1
    Big upvote for the explanation. Didn't know that R uses a pointer in that case. – Roman Oct 18 '18 at 16:57
  • @ismirsehregal Not sure about the changes in this case, but usually when we do the `:=`, I create a copy of the initial object if I want to keep it separate – akrun Oct 18 '18 at 16:59