1

(In continuation of Conditional Replacing with NA in R (two dataframes))

So basically, I have

idx <- c(1397, 2000, 3409, 3415, 4077, 4445, 5021, 5155) 

idy <- c( 1397, 2000, 2860, 3029, 3415, 3707, 4077, 4445, 5021, 5155, 
         5251, 5560)

agex <- c(NA, NA, NA, 35, NA, 62, 35, 46)

agey <- c( 3, 45,  0, 89,  7,  2, 13, 24, 58,  8,  3, 45)

and I put each of them in a data.frame and make a copy of these dataframes

  dat1 <- as.data.frame(cbind(idx, agex))
  dat1copy <- dat1
  dat2 <- as.data.frame(cbind(idy, agey))
  dat2copy <- dat2

and I want to check whether for all cases idy=idx, agex=NA, if yes, then agey should be set to NA too (and this should happen ONLY for dat2, and not dat2copy, which should remain untouched from the NA transfer)

However after,

    library(data.table)
    setDT(dat1)
    setDT(dat2)
   dat2[dat1[is.na(agex)], on=.(idy = idx), agey := NA]

dat2copy is updated too and also has NAs at the same places as the updated dat2. What can I do to prevent this kind of double updating, or how can I store a copy of the original dat2?

Parinn
  • 209
  • 1
  • 7

1 Answers1

1

To ensure that dat2copy is distinct from dat2 after conversion to data.table use the data.table copy function:

library(data.table)

dat1 <- data.frame(idx, agex)
dat2 <- data.frame(idy, agey)

# wrong - same addresses
dat2copy <- dat2
address(dat2) == address(dat2copy)
## [1] TRUE

# correct - different addresses but equal contents
dat2copy <- copy(dat2)
address(dat2) == address(dat2copy)
## [1] FALSE
identical(dat2, dat2copy)
## [1] TRUE

setDT(dat1)
setDT(dat2)
dat2[dat1[is.na(agex)], on=.(idy = idx), agey := NA]

identical(dat2, dat2copy)
## [1] FALSE
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341