In R, turn off automatic update by data.table on identical data.frame?

Question

(In continuation of Conditional Replacing with NA in R (two dataframes))

So basically, I have

idx <- c(1397, 2000, 3409, 3415, 4077, 4445, 5021, 5155) 

idy <- c( 1397, 2000, 2860, 3029, 3415, 3707, 4077, 4445, 5021, 5155, 
         5251, 5560)

agex <- c(NA, NA, NA, 35, NA, 62, 35, 46)

agey <- c( 3, 45,  0, 89,  7,  2, 13, 24, 58,  8,  3, 45)

and I put each of them in a data.frame and make a copy of these dataframes

  dat1 <- as.data.frame(cbind(idx, agex))
  dat1copy <- dat1
  dat2 <- as.data.frame(cbind(idy, agey))
  dat2copy <- dat2

and I want to check whether for all cases idy=idx, agex=NA, if yes, then agey should be set to NA too (and this should happen ONLY for dat2, and not dat2copy, which should remain untouched from the NA transfer)

However after,

    library(data.table)
    setDT(dat1)
    setDT(dat2)
   dat2[dat1[is.na(agex)], on=.(idy = idx), agey := NA]

dat2copy is updated too and also has NAs at the same places as the updated dat2. What can I do to prevent this kind of double updating, or how can I store a copy of the original dat2?

G. Grothendieck · Accepted Answer · 2018-10-26T12:40:50.250

1

To ensure that dat2copy is distinct from dat2 after conversion to data.table use the data.table copy function:

library(data.table)

dat1 <- data.frame(idx, agex)
dat2 <- data.frame(idy, agey)

# wrong - same addresses
dat2copy <- dat2
address(dat2) == address(dat2copy)
## [1] TRUE

# correct - different addresses but equal contents
dat2copy <- copy(dat2)
address(dat2) == address(dat2copy)
## [1] FALSE
identical(dat2, dat2copy)
## [1] TRUE

setDT(dat1)
setDT(dat2)
dat2[dat1[is.na(agex)], on=.(idy = idx), agey := NA]

identical(dat2, dat2copy)
## [1] FALSE

edited Oct 26 '18 at 12:40

answered Oct 26 '18 at 12:19

G. Grothendieck

254,981
17
203
341

Thank you soo much, it works perfectly – Parinn Oct 26 '18 at 12:25

In R, turn off automatic update by data.table on identical data.frame?

1 Answers1