I'm trying to define a new variable in a data.table
through a merge. The wrinkle is that I'd like to access .N
in order to define the new variable by group, so I'd like to use by
as well, but this is causing an error.
MRE:
dt1<-data.table(pd=rep(1:2,each=3),rnk=rep(1:3,2),
var=c(3:1,1:3),key="pd")
dt2<-data.table(pd=c(1,2),chk=c(2,2),key="pd")
dt1[dt2,new:=var[.N]>i.chk,by=pd]
As you can see, I'd like to define new
to be TRUE
whenever the (within-pd
) highest-rnk
value of var
exceeds 2. The code above seems natural enough to me, but this results in an error: object 'i.chk' not found
(suggesting the merge has not been completed, as the name space of dt2
appears unavailable).
I can get around this with a second step:
> dt1[dt2,new:=var>i.chk][,new:=new[.N],by=pd][]
pd rnk var new
1: 1 1 3 FALSE
2: 1 2 2 FALSE
3: 1 3 1 FALSE
4: 2 1 1 TRUE
5: 2 2 2 TRUE
6: 2 3 3 TRUE
However, this slows down my code substantially since I'm using `:=`
to update around 6 such columns.
Is there no way to update by reference by group when merging?