In data.table
versions <= 1.9.2, a join
of the form x[i, j=...]
- that is, a join where j
is also used was designed to be an implicit by (or) by-without-by
operation. In other words, this'll calculate j
for each value in i
. So it won't work as you intend.
This design choice has been changed in the current development version 1.9.3 (which'll at some point be pushed to CRAN versioned 1.9.4), for consistency, after feedback from a lot of users. You can check the discussions here, here and the feature request (FR) here.
So in 1.9.3
, this will work as intended (as @BenBarnes points out). That is, by default, x[i, j=...]
will first perform the join
and the evaluate j
after, once, instead of obtaining j
for each i
. If instead you'd like the old behaviour, you'll have to explicitly state by
as follows:
## v 1.9.3
## performs the join and then calculates/evaluates j
x[i, j]
## explicitly state by to obtain j for each i
x[i, j, by=.EACHI]
When this version hits CRAN, there should also be a provision to use the old version (so that existing code doesn't break), with a warning that this feature will be deprecated in the next release (or something like that - how this'll be done is not finalised yet).
To summarise, your code will work as intended from versions >= 1.9.3
.
Note that .EACHI
feature is not yet documented in ?data.table
. This is still a development version. When it's being released to CRAN, you can find the documentation for .EACHI
in ?data.table
, where all the other special variables like .I
, .N
, .GRP
, .BY
etc.. are also documented.
HTH
Edit: If you've to do this efficiently using <= 1.9.2
, then you can do it by first finding the matching indices as follows:
idx = DT[DT.alt, which=TRUE]
DT[idx, V := rnorm(length(idx))]