0

I have two data.tables like this

dt1 <- data.table(ID = c(1001:1010,1003,1003,1004),
                  CLASS_CODE=c(10,11,rep(NA,2),14,NA,16,NA,18,19,rep(NA,3)))

dt2 <- data.table(ID = c(1003,1004,1006,1008),
                  CLASS_CODE=c(101:104))

Note that there are some repeated ID's in dt1.

In dt2 there are the missing ID's with some alternative CLASS_CODE's.

I would like to complete dt1 using the values in dt2. I've tried to use:

dt1[is.na(CLASS_CODE),CLASS_CODE := dt2[ID %in% dt1[is.na(CLASS_CODE),ID],CLASS_CODE] ]

but the %in% don't repeat the last three IDs

Anyone would now a way to use := in order to get it right?

I figured out an an alternative way using merge however I've still think that using := would be faster/shorter.

Using merge:

dt1 <- merge(dt1,
             dt2,
             by = "ID", all.x = TRUE,sort = FALSE)

dt1[is.na(CLASS_CODE.x), CLASS_CODE.x := CLASS_CODE.y]
dt1$CLASS_CODE.y <- NULL
colnames(dt1)[2] <- "CLASS_CODE"
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • There seems to be a data.table answer here: https://stackoverflow.com/a/54568079/2372064 – MrFlick Jul 27 '20 at 22:46
  • 1
    [Replace missing values (NA) in one data set with values from another where columns match](https://stackoverflow.com/a/32639171/1851712) – Henrik Jul 27 '20 at 22:49
  • You are welcome @LucasRagni. The code suggested by Arun [in the comment](https://stackoverflow.com/questions/32638845/replace-missing-values-na-in-one-data-set-with-values-from-another-where-colum/32639171#comment53298586_32639171) is _extremely_ useful for update-joins. That syntax is further described here: [Which data.table syntax for left join (one column) to prefer](https://stackoverflow.com/questions/54312225/which-data-table-syntax-for-left-join-one-column-to-prefer/54313203#54313203). Cheers. – Henrik Jul 27 '20 at 23:08

0 Answers0