0

I have a verge large transaction data , looks like:

transactionid items
1 i1
1 i2
2 i3
2 i1

how to calculate co-occurence matrix with data.table or python ps: the ordinary methods doesn't work since the big data the expected output is

i1 i2 i3
i1 0 1 1
i2 1 0 0
i3 1 0 0
PKumar
  • 10,971
  • 6
  • 37
  • 52
Philo
  • 3
  • 2
  • It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Mar 05 '21 at 05:31

1 Answers1

0

This isn't data.table but it should work for you:

df <- crossprod(table(df)) 
diag(df) <- 0
df <- as.data.frame(df)

Gives us:

   i1 i2 i3
i1  0  1  1
i2  1  0  0
i3  1  0  0
Matt
  • 7,255
  • 2
  • 12
  • 34
  • thank u ! it did work in a small size data, but the transaction data has more than one hundred million rows – Philo Mar 05 '21 at 06:12
  • @Philo For larger data you may check the dedicated tool `arules::apriori`. See timings [here](https://stackoverflow.com/questions/63323851/count-common-sets-of-items-between-different-customers) – Henrik Mar 05 '21 at 08:32
  • @Henrik it also doesn't work with lager data ... – Philo Mar 05 '21 at 09:08