2

I have the following two data.tables

library(data.table)

dt1 = data.table(index_column = c(12, 17, 29, 34, 46), column1 = c("dog", "cat", "bird", "elephant", "bird"), column2 = c(482, 391, 567, 182, 121))

dt2 = data.table(index_column = c(17, 29, 46), column1 = c("cat", "penguin", "bird"))

> dt1
   index_column  column1 column2
1:           12      dog     482
2:           17      cat     391
3:           29     bird     567
4:           34 elephant     182
5:           46     bird     121


> dt2
   index_column column1
1:           17     cat
2:           29 penguin
3:           46    bird

Upon merging these two data.table solely on the shared index_column with

merged = merge(dt1, dt2, by="index_column", all=TRUE)

the resulting data.table is:

   index_column column1.x column2 column1.y
1:           12       dog     482        NA
2:           17       cat     391       cat
3:           29      bird     567   penguin
4:           34  elephant     182        NA
5:           46      bird     121      bird

I am interested in replacing values of column1.x with the values of column1.y if they are not NA. Most of the values will be the same, but those that are not (e.g. bird/penguin) should be replaced.

One could do this would an if statement, e.g.

if ((merged$column1.x != merged$column1.y) & !is.na(merged$column1.y)){
   merged$column1.x = merged$column1.y
}

merged$column1.y = NULL

My concern is that this isn't a very data.table solution. It will not scale very well if the data.table has millions of rows.

How would one conditionally replace values of one column based on another column in an R data.table? Is it more efficient to simply replace one with the other, ignoring the NA's?

ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234

1 Answers1

6

We can use the on based approach

dt1[dt2, column1 := i.column1, on = .(index_column)]
dt1
#   index_column  column1 column2
#1:           12      dog     482
#2:           17      cat     391
#3:           29  penguin     567
#4:           34 elephant     182
#5:           46     bird     121
akrun
  • 874,273
  • 37
  • 540
  • 662