How do you deal with identically named, non-key columns when joining data.tables? I am looking for a solution to table.field
notation in SQL.
For instance, lets' say I have a table DT that is repopulated with new data for column v
every time period. I also have a table DT_HIST that stores entries from previous time periods (t
). I want to find the difference between the current and previous time period for each x
In this case: DT is time period 3, and DT_HIST has time periods 1 and 2:
DT <- data.table(x=c(1,2,3,4),v=c(20,20,35,30))
setkey(DT,x)
DT_HIST <- data.table(x=rep(seq(1,4,1),2),v=c(40,40,40,40,30,25,45,40),t=c(rep(1,4),rep(2,4)))
setkey(DT_HIST,x)
> DT
x v
1: 1 20
2: 2 20
3: 3 35
4: 4 30
> DT_HIST
x v t
1: 1 40 1
2: 1 30 2
3: 2 40 1
4: 2 25 2
5: 3 40 1
6: 3 45 2
7: 4 40 1
8: 4 40 2
I would like to join DT
with DT_HIST[t==1,]
on x
and calculate the difference in v
.
Just joining the tables results in columns v
and v.1
.
> DT[DT_HIST[t==2],]
x v v.1 t
1: 1 20 30 2
2: 2 20 25 2
3: 3 35 45 2
4: 4 30 40 2
However, I can't find a way to refer to the different v
columns when doing the join.
> DT[DT_HIST[t==2],list(delta=v-v.1)]
Error in `[.data.table`(DT, DT_HIST[t == 2], list(delta = v - v.1)) :
object 'v.1' not found
> DT[DT_HIST[t==2],list(delta=v-v)]
x delta
1: 1 0
2: 2 0
3: 3 0
4: 4 0
If this is a duplicate, I apologize. I searched and couldn't find a similar question.
Also, I realize that I can simply rename the columns after joining and then run my desired expression, but I want to know if I'm doing this in the completely wrong way.