0

Suppose I have a data.table in R:

> A=data.table(Col1=c(1,4,2,5,6,2,3,5,3,7))
> A
    Col1
 1:    1
 2:    4
 3:    2
 4:    5
 5:    6
 6:    2
 7:    3
 8:    5
 9:    3
10:    7

And a key-value data.table where

> B=data.table(Col1=c(1,2,3,4,5,6,7),Col2=c("A","B","C","D","E","F","G"))
> B
   Col1 Col2
1:    1    A
2:    2    B
3:    3    C
4:    4    D
5:    5    E
6:    6    F
7:    7    G

I would like to have Col1 of data.table A reference B and create a new column in A that corresponds to the key-value pairs:

    Col1 Col2
 1:    1    A
 2:    4    D
 3:    2    B
 4:    5    E
 5:    6    F
 6:    2    B
 7:    3    C
 8:    5    E
 9:    3    C
10:    7    G

How can I do this in data.table? Thanks

user321627
  • 2,350
  • 4
  • 20
  • 43
  • Look at `?merge` – Shree Feb 09 '23 at 03:38
  • 2
    You want to do an update join. Study the package vignettes. – Roland Feb 09 '23 at 06:37
  • I really feel like this is a dupe of Shree's link, but none of those answers really demonstrate the merge-assignment of Wimpel's answer (e.g., `A[B, Col2 := i.Col2, on = .(Col1)]`). Here's (perhaps) a more canonical link: https://stackoverflow.com/q/34598139/3358272 – r2evans Feb 09 '23 at 13:03

2 Answers2

3

What you are looking for is a join by reference/update join.

This looks for the value of A$Col1 in B$Col1, and returns the first match of B$Col2 (so if there are >1 matches, the value returned depends on how B is ordered). In the code, this is referred as i.Col2, since B is in the i-part of the data.table syntax. It is usually the fastest way to join, but remember that it only returns the first match. SO if there are multiple values of B$Col2 fot the same B$Col1 value, you will only get one (the topmost) value returned.

A[B, Col2 := i.Col2, on = .(Col1)]

    Col1 Col2
 1:    1    A
 2:    4    D
 3:    2    B
 4:    5    E
 5:    6    F
 6:    2    B
 7:    3    C
 8:    5    E
 9:    3    C
10:    7    G
Wimpel
  • 26,031
  • 1
  • 20
  • 37
-1

Using data.tables on argument

library(data.table)

A[B, , on = "Col1"]
    Col1 Col2
 1:    1    A
 2:    2    B
 3:    2    B
 4:    3    C
 5:    3    C
 6:    4    D
 7:    5    E
 8:    5    E
 9:    6    F
10:    7    G
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29