I am trying to do a cross join (from the original question here), and I have 500GB of ram. The problem is that the final data.table
has more than 2^31 rows, so I get this error:
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
Join results in more than 2^31 rows (internal vecseq reached physical limit). Very likely misspecified join. Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
Is there a way to override this? When I add by=.EACHI
, I get the error:
'by' or 'keyby' is supplied but not j
I know this question is not in ideal reproducible format (my apologies!), but I am not sure that is strictly necessary for an answer. Maybe I am just missing something or data.table is limited in this way?
I am aware only of this question from 2013, which seems to suggest data.table
could not do this back then.
This is the below code that causes the error:
pfill=q[, k:=t+1][q2[, k:=tprm], on=.(k), nomatch=0L,allow.cartesian=TRUE][,k:=NULL]