I can conduct a matched copy of a column in a data frame into another by something like
DF2$y <- DF1[match(DF2$id2, DF1$id1), "z"] # DF1 and DF2 are data frames
where DF2$id2
is matched to DF1$id1
. I would like to know what I can do with data tables for this kind of operation. My data tables have millions of rows and hundreds of columns. I've done setkey(DT1, id1)
and setkey(DT2, id2)
.
This works:
DT2[, y := DT1[match(DT2$id2, DT1$id1), "z"]] # DT1 and DT2 are data tables
but I am afraid that the match
part might take long than necessary. (Or is it inevitable?)
I understand that I can also use column selection, merge
, and renaming:
tmp <- DT1[, c("id1", "z")] # column selection
DT3 <- merge(DT2, tmp, by.x = "id2", by.y = "id1", all.x = TRUE, suffixes = c("", ".y")) # merge
setnames(DT3, "z.y", "y") # rename
(where the first two lines can be written on one line) but this seems a bit too complicated. Would there be a simpler and fast solution?
Thanks.
Example:
library(data.table)
DF1 <- data.frame(id1=2:4, x=LETTERS[1:3], z=11:13)
DF2 <- data.frame(id2=1:4, x=LETTERS[5:8], z=21:24)
DF1
# id1 x z
# 1 2 A 11
# 2 3 B 12
# 3 4 C 13
DF2
# id2 x z
# 1 1 E 21
# 2 2 F 22
# 3 3 G 23
# 4 4 H 24
DT1 <- data.table(DF1)
DT2 <- data.table(DF2)
setkey(DT1, id1)
setkey(DT2, id2)
DF2$y <- DF1[match(DF2$id2, DF1$id1), "z"]
DF2 # correct
# id2 x z y
# 1 1 E 21 NA
# 2 2 F 22 11
# 3 3 G 23 12
# 4 4 H 24 13
DT2[, y := DT1[match(DT2$id2, DT1$id1), "z"]]
DT2
# id2 x z y
# 1: 1 E 21 NA
# 2: 2 F 22 11
# 3: 3 G 23 12
# 4: 4 H 24 13
DT2[, y := NULL]
tmp <- DT1[, c("id1", "z")]
DT3 <- merge(DT2, tmp, by.x = "id2", by.y = "id1", all.x = TRUE, suffixes = c("", ".y"))
setnames(DT3, "z.y", "y")
DT3
# id2 x z y
# 1: 1 E 21 NA
# 2: 2 F 22 11
# 3: 3 G 23 12
# 4: 4 H 24 13
## Simpler alternatives?