how to calculate co-occurence with data.table

Question

I have a verge large transaction data ， looks like:

how to calculate co-occurence matrix with data.table or python ps: the ordinary methods doesn't work since the big data the expected output is

It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). — Ronak Shah, Mar 05 '21 at 05:31

score 0 · Answer 1 · answered Mar 05 '21 at 06:06

0

This isn't data.table but it should work for you:

df <- crossprod(table(df)) 
diag(df) <- 0
df <- as.data.frame(df)

Gives us:

   i1 i2 i3
i1  0  1  1
i2  1  0  0
i3  1  0  0

answered Mar 05 '21 at 06:06

Matt

thank u ! it did work in a small size data, but the transaction data has more than one hundred million rows – Philo Mar 05 '21 at 06:12
@Philo For larger data you may check the dedicated tool `arules::apriori`. See timings [here](https://stackoverflow.com/questions/63323851/count-common-sets-of-items-between-different-customers) – Henrik Mar 05 '21 at 08:32
@Henrik it also doesn't work with lager data ... – Philo Mar 05 '21 at 09:08

1 Answers1