0

When I create a data.table and save its columns names in an object, the elements of that object change if I modify the data.table by reference :=, by adding more variables. I though that once an object is created in R it remains stable as long as it is not explicitly modified, but it seems to me that if an object was created from a data.table, it is also modified implicitly when the original data.table is modified explicitly. Is that correct? See my code below and the suggested solution.

I don't know if this is an error, but if not, I would like to understand the behavior of data.table and find a better solution to the one suggested.

library(data.table)

# create data.table with two variables
DT <- data.table(x = 1, y = 2)

# store the variables names in object
original_names <- names(DT)

# add one more variables
DT[, z := 3]

# new object with the name of the three variables
new_names <- names(DT)

# these two should NOT be identical, yet they are. 
identical(original_names, new_names)
#> [1] TRUE



# solution
DT <- data.table(x = 1, y = 2)
# Create another data.frame with the minimum information 
# necessary to save memory and still get the variable names.
# This is what I think is inefficient.  
DF <- as.data.frame(DT[1,])

# store the variables names in object
original_names <- names(DF)

# add one more variable
DT[, z := 3]

# new object with the name of the three variables
new_names <- names(DT)

# These two are not identival anymore. 
identical(original_names, new_names)
#> [1] FALSE

Created on 2023-07-02 with reprex v2.0.2

  • 1
    This is related to https://stackoverflow.com/q/10225098/3358272. There's another question (can't find it atm) that specifically discusses this; one starting point would be [`?setattr`](https://rdatatable.gitlab.io/data.table/reference/setattr.html). It has to do with the fact that `names(.)` is returning a reference to an existing vector of names stored as an attribute to the table. Since `names(.)` is not doing a deep copy of the contents of that vector, and `data.table` works by-reference, you can see a vector storing the result of `names(.)` _change_ as the `data.table` does. – r2evans Jul 03 '23 at 03:14
  • 1
    I would use `copy(names(DT))`. – s_baldur Jul 03 '23 at 08:20
  • Thank you both. r2evans provides the explanation and s_valdur provides the solution to the problem. Highly appreciated guys. – R.Andres Castaneda Jul 03 '23 at 12:46

0 Answers0