I am trying to do a conditional cross join in data table, and I am running into this error:
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
Join results in more than 2^31 rows (internal vecseq reached physical limit). Very likely misspecified join. Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
After doing some sleuthing, I still cannot find a solution here. I have more than 1TB of RAM and memory is not the issue. Below is a reproducible example, as you scale up N
eventually the code will give this error.
N=10000
J=50
dat=data.table(CJ('t'=1:N,'a'=1:N,'j'=1:5))
dat2 = data.table(CJ('j_prime'=1:J,'t_prime'=1:N))
datfinal = dat[, k:=(t+1)][dat2[, k:=t_prime], on=.(k), nomatch=0L,allow.cartesian=TRUE][,k:=NULL]