0

I have two datatable that I would like to Join by the two key columns:

df1=data.table(key1=c(1.2,1.1,1.9,4.6,2.3,3.6,0.9,2.8),key2=c("A","B","A","B","A","B","A","B"))
df2=data.table(key1=c(1,1,2,3),key2=c("A","B","B","B"),value=c("A","B","D","C"))

and here is the desired output:

   key1 key2 value
1:  0.9    A     A
2:  1.1    A     A
3:  1.2    B     B
4:  1.9    A    NA
5:  2.3    A    NA
6:  2.8    B     C
7:  3.6    B    NA
8:  4.6    B    NA

Matching numeric keys with nearest value could be done with the following:

# Join with closest value of Numeric Key, works fine

df1=data.table(key1=c(1.2,1.1,1.9,4.6,2.3,3.6,0.9,2,8),key2=c("B","B","A","A"))
    df2=data.table(key1=c(1,1,2,3),key2=c("A","B","B","B"),value=c("A","B","D","C"))
    setkey(df1,key1)
    setkey(df2,key1)
    df2[df1,roll="nearest"]

Also if numeric keys would have excact matches the problem could be solved again:

df1=data.table(key1=c(1,1,2,5,2,4,1,3),key2=c("A","B","A","B","A","B","A","B"))
df2=data.table(key1=c(1,1,2,3),key2=c("A","B","B","B"),value=c("A","B","D","C"))
setkey(df1,key1,key2)
setkey(df2,key1,key2)
df2[df1]

Solution does not have to be with data.table.

Thanks

patL
  • 2,259
  • 1
  • 17
  • 38
Antonios
  • 1,919
  • 1
  • 11
  • 18
  • `roll=` works on the final column in the key or an `on=` expression, so ... `df2[df1, on=.(key2, key1), roll="nearest"]` ? You might want to see Arun's answer about when to set keys here https://stackoverflow.com/a/20057411/ – Frank Feb 28 '18 at 16:32
  • Many thanks Frank. But the output I get with your code does not have NA `value` elements – Antonios Feb 28 '18 at 16:39
  • 1
    Ah, didn't notice you had desired output. I don't understand why `("A", 1.9)` in df1 should not be nearest `.("A", 1)` in df2, since 1 is the nearest value to 1.9 subject to key2 == "A", right? You have a threshold for nearness? As far as I know, such thresholds can only be applied one-sided, eg with `roll=0.5`. If you simply want to round, better to do that first, I guess: `df1[, v := round(key1)]; df1[, value := df2[df1, on=.(key2, key1 = v), x.value]]` – Frank Feb 28 '18 at 16:53
  • 1
    Thank you again I actually think your answer solves my real - life problem, maybe I should have prepared a bit clearer example – Antonios Feb 28 '18 at 17:07

0 Answers0