This would be the way you translate the R
code to Python
code.
>>> import numpy as np
>>> a=np.array([[0, 1, 0, 0, 1, 1],
[0, 0, 1, 1, 0, 1],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 0, 1, 1]])
>>> acov=np.dot(a.T, a)
>>> acov[np.diag_indices_from(acov)]=0
>>> acov
array([[0, 2, 2, 1, 1, 1],
[2, 0, 2, 1, 2, 2],
[2, 2, 0, 2, 1, 2],
[1, 1, 2, 0, 0, 1],
[1, 2, 1, 0, 0, 2],
[1, 2, 2, 1, 2, 0]])
However, you have a very big dataset. If you don't want to assemble the co-occurence matrix piece by piece and you store your values in int64
, with 3e+9 numbers it will take 24GB of RAM alone just to hold the data http://www.wolframalpha.com/input/?i=3e9+*+8+bytes. So you probably want to think over and decide which dtype
you want to store your data in: http://docs.scipy.org/doc/numpy/user/basics.types.html. Using int16
probably will make the dot
product operation possible on a decent desktop PC nowadays.