3

This question is a follow-up from a previous one:

left outer join with data.table with different names for key variables

I have now tried to use the "on" argument in the manner suggested by Matt Dowle. I am puzzled that it does not work. I wonder if this currently works only with version 1.9.7 of data.table.

packageVersion('data.table')
# [1] ‘1.9.6’

DT1 = data.table(x1 = c("b", "c", "a", "b", "a", "b"), 
                 x2a = as.character(1:6), m1 = seq(10, 60, by = 10))
DT1
#    x1 x2a m1
# 1:  b   1 10
# 2:  c   2 20
# 3:  a   3 30
# 4:  b   4 40
# 5:  a   5 50
# 6:  b   6 60

DT2 = data.table(x1 = c("b", "d", "c", "b" ,"a", "a"),
                 x2b = c(1, 4, 7, 6, " ", " ") ,m2 = 5:10)
DT2
#    x1 x2b m2
# 1:  b   1  5
# 2:  d   4  6
# 3:  c   7  7
# 4:  b   6  8
# 5:  a      9
# 6:  a     10

#### merge command works fine
rtL <- merge(DT1, DT2, by.x = c('x1', 'x2a'),
             by.y = c('x1', 'x2b'), all.x = TRUE)
rtL
#    x1 x2a m1 m2
# 1:  a   3 30 NA
# 2:  a   5 50 NA
# 3:  b   1 10  5
# 4:  b   4 40 NA
# 5:  b   6 60  8
# 6:  c   2 20 NA   

#### Join with the X[Y] syntax with the 'on' argument
rtL2 <- DT2[DT1, on = c('x1', x2a = 'x2b')]

Error in forderv(x, by = rightcols) : by value -2147483648 out of range [1,3]

What is wrong here? Does this require version 1.9.7?

##### Another attempt with the x1 variable in quotes
rtL3 <- DT2[DT1, on = c("x1", "x2a" = "x2b")] 

Error in forderv(x, by = rightcols) : by value -2147483648 out of range [1,3]

It seems to me that the rtL2 version is more correct.

How do I interpret the error message? And what is it that I am doing wrong here?

Community
  • 1
  • 1
Vivek
  • 107
  • 1
  • 7

1 Answers1

4

In fact, this is an issue with 1.9.6 that has since been fixed in the development version.

The problem (in addition to getting the order switched) is that you don't name the the counterpart to x1 in DT2, see GitHub:

  1. When joining with on=, X[Y, on=c(A="A", b="c")] can be now specified as X[Y, on=c("A", b="c")], fully closes #1375.

As for 1.9.6, the following works:

packageVersion('data.table')
# [1] ‘1.9.6’
DT1[DT2, on = c(x1 = "x1", x2a = "x2b")]
#    x1 x2a m1 m2
# 1:  b   1 10  5
# 2:  d   4 NA  6
# 3:  c   7 NA  7
# 4:  b   6 60  8
# 5:  a     NA  9
# 6:  a     NA 10
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198