I'm implementing a special case of EM-GMM.
X is the data matrix of shape [1000000, 900] and is a numpy mmap object
Q is a precision matrix of shape [900, 900] and is a ndarray
I'm also using the Multiprocessing library to go over 200 Q matrices concurrently on 40 cores, using the same data matrix (X).
It works over smaller dimensions like [1mil, 196], [1mil, 400],
but when i try to run the [1mil, 900] at some point on of the processes throws an exception:
OSError: [Errno 12] Cannot allocate memory
I guess the issue is because of 2 big calculations I have, which probably allocate big matrices.
As part of the E-step I need to calculate:
np.sum(X.dot(Q) * X, axis=1)
As part of the M-step I need to calculate (W is a [1mil, 1] weights vector):
(X.T * W).dot(X)
In the future I would have to run this EM-GMM over data of even bigger size (of shape [2mil, 2500] and even [2mil, 10k])
What can I do to make those calculation more memory efficient?
EDIT:
I've noticed that the worker initialization uses pickle, so the X-matrix is turned into ndarray and the workers doesn't share it (which means the X-matrix is duplicated for all workers and fills my RAM)
I have an idea of how to solve it, and will update if it's fixed.
But If anyone has a good idea of how to deal with it I'll be grateful.