Let me first start by saying that this is the first question that I am posting on stack overflow. Please let me know if I need to change the style,format etc in my questions.
I would like to do a left outer join operation on two data tables with an exra condition that I be allowed to have different names for the key variables in the two data tables. Example:
DT1 = data.table(x1=c("b","c", "a", "b", "a", "b"), x2a=1:6,m1=seq(10,60,by=10))
setkey(DT1,x1,x2a)
> DT1
x1 x2a m1
1: a 3 30
2: a 5 50
3: b 1 10
4: b 4 40
5: b 6 60
6: c 2 20
DT2 = data.table(x1=c("b","d", "c", "b","a","a"),x2b=c(1,4,7,6," "," "),m2=5:10)
setkey(DT2,x1,x2b)
> DT2
x1 x2b m2
1: a 9
2: a 10
3: b 1 5
4: b 6 8
5: c 7 7
6: d 4 6
############# first, I use the merge operation on the data frames to do a left outer join
dfL<-merge.data.frame(DT1,DT2,by.x=c('x1','x2a'),by.y=c('x1','x2b'),all.x=TRUE)
> dfL
x1 x2a m1 m2
1 a 3 30 NA
2 a 5 50 NA
3 b 1 10 5
4 b 4 40 NA
5 b 6 60 8
6 c 2 20 NA
################# attempt with data table left outer join
> dtL<-DT2[DT1,on=c("x1","x2a")]
Error in forderv(x, by = rightcols) :
'by' value -2147483648 out of range [1,3]
#################### code that works with data table
DT1 = data.table(x1=c("b","c", "a", "b", "a", "b"), x2=as.character(1:6),m1=seq(10,60,by=10))
setkey(DT1,x1,x2)
DT1
DT2 = data.table(x1=c("b","d", "c", "b","a","a"),x2=c(1,4,7,6," "," ") ,m2=5:10)
setkey(DT2,x1,x2)
DT2
dtL<-DT2[DT1]
######################## this required identical naming of the key variables in the two data tables
################### Also does not allow a ad-hoc selection of the key variables with the "on" argument
I would like to know if it is possible to retain the flexibility of the merge command with data frames. With data.table.