I just stumbled upon some weird behavior in data.table. In short, using ":=" to change (replace) the value of a column in a data.table seems to also change the values in another data.table (which is a copy of the original data.table before the := operation). Sample code is below.
Am I missing something fundamental about the otherwise excellent package, or should there be a bug report?
Sub-question: Is ifelse() the best way to change the values as done below (in a fairly large table, ~10m rows)? It does the job as expected as is quick enough (a few seconds) but with verbose=TRUE data.table complains ("RHS for item 1 has been duplicated. Either NAMED vector or recycled list RHS.) and I have not been able to decipher the message so far :)
library(data.table)
options(datatable.verbose=TRUE)
DT1 <- data.table(f=as.integer(c(1,2,1,1,1,2,1)))
DT2 <- DT1
tables()
DT1
DT2
identical(DT1, DT2) # OK, they should be identical.
# I am not sure ifelse() is the best way to do this, but it does what I want, even though data.table complains
DT1[, f := as.character(ifelse(f==1,"a","b"))]
tables()
DT1
DT2
identical(DT1, DT2) # Not OK -- why did DT2 change?
If relevant, my system is:
R version 2.15.3 (2013-03-01) -- "Security Blanket" Platform: x86_64-w64-mingw32/x64 (64-bit) data.table 1.8.8 All 943 tests in test.data.table() completed ok in 27.869sec
Thanks.