0

Refering to this for short-hand join of data table in R

DT1 = as.data.table(data.frame(col1 = c(1,2,3,2,5,1,3,3,1,2), col2 = c(3,4,5,4,3,4,5,3,4,5), col3 = c(1,2,3,4,5,6,7,8,9,10)))
DT2 = as.data.table(data.frame(col1 = c(1,2,1,2,3,4,3,4,3), col2 = c(3,4,5,3,6,4,5,3,4), col3=c(11,12,13,14,15,16,17,19,20)))
setkey(DT1, col1, col2)
setkey(DT2, col1, col2)

DT1[DT2, col4 := DT2$col3]
DT1
DT2

Warning message:
In `[.data.table`(DT1, DT2, `:=`(col4, DT2$col3)) :
  Supplied 9 items to be assigned to 11 items of column 'col4' (recycled leaving remainder of 2 items).

The result is

DT1
    col1 col2 col3 col4
 1:    1    3    1   11
 2:    1    4    6   NA
 3:    1    4    9   NA
 4:    2    4    2   12
 5:    2    4    4   20
 6:    2    5   10   NA
 7:    3    3    8   NA
 8:    3    5    3   15
 9:    3    5    7   19
10:    5    3    5   NA

DT2
   col1 col2 col3
1:    1    3   11
2:    1    5   13
3:    2    3   14
4:    2    4   12
5:    3    4   20
6:    3    5   17
7:    3    6   15
8:    4    3   19
9:    4    4   16

What is happening ? Why do we have the error and the result is not correct, for example line 4 and 5 in DT1 ?

Henrik
  • 65,555
  • 14
  • 143
  • 159
Kenny
  • 1,902
  • 6
  • 32
  • 61
  • 2
    In the post you link to, please note the use of the prefix `i.` (_not_ `df2$`). See also `?data.table`: "When `i` is a `data.table`, the columns of `i` can be referred to in `j` by using the prefix `i.`" – Henrik Jul 10 '18 at 12:32
  • Thanks @Henrik . What is the logical difference between 4 forms`DT1[DT2]`, `DT1[DT2, i.col3]` and `DT1[DT2, col4 := i.col3]`, `DT1[DT2, col4 := DT2$col3]` ? The result is different in each case – Kenny Jul 10 '18 at 12:43
  • 1
    Sorry, this is a little bit to much to answer in a comment. Please refer to the nice vignettes. I can really recommend working through the examples in `?data.table` and `?:=`, using truly minimal toy data. It's definitely worth it. Good luck! – Henrik Jul 10 '18 at 13:08

0 Answers0