I am working with binary (only 0's and 1's) matrices of rows and columns in the order of a few thousands. For example, the number of rows are between 2000 - 7000 and number of columns are between 4000 - 15000. My computer has more then 100g RAM.
I'm surprised that even with these sizes, I am getting MemoryError
with the following code. For reproducibility, I'm including an example with a smaller matrix (10*20) Note than both of the following raise this error:
import numpy as np
my_matrix = np.random.randint(2,size=(10,20))
tr, tc = np.triu_indices(my_matrix.shape[0],1)
ut_sums = np.sum(my_matrix[tr] * my_matrix[tc], 1)
denominator = 100
value = 1 - ut_sums.astype(float)/denominator
np.einsum('i->', value)
I tried to replace the elementwise multiplication in the above code to einsum as below, but it also generates the same MemoryError:
import numpy as np
my_matrix = np.random.randint(2,size=(10,20))
tr, tc = np.triu_indices(my_matrix.shape[0],1)
ut_sums = np.einsum('ij,ij->i', my_matrix[tr], my_matrix[tc])
denominator = 100
value = 1 - ut_sums.astype(float)/denominator
np.einsum('i->', value)
In both cases, the printed Traceback points to the line where ut_sums
is being calculated.
Please note that my code has other operations too, and there are other statistics calculated on matrices of similar sizes, but with more than 100 g, I thought it should not be a problem.