I'm attempting to write a collapsed Gibbs sampler in Python and am running into memory issues when creating initial values for one of my matrices. I am rather new to Python, so below is the outline of what I am doing with explanation. At 4 I receive my MemoryError
My goal is to:
Create an T,M matrix of zeros (plus an alpha value), where T is some small number such as 2:6 and M can be very large
import numpy as np import pandas as pd M = 500 N = 10000 T = 6 alpha = .3 NZM = np.zeros((T,M), dtype = np.float64) + alpha
Create an M,N matrix of numbers generated by a multinomial distribution from T topics which would look like the following.
Z = np.where(np.random.multinomial(1,[1./ntopics]*ntopics,size = M*N )==1)[1] Z array([[1, 3, 0, ..., 5, 3, 1], [3, 5, 0, ..., 5, 1, 2], [4, 5, 4, ..., 1, 3, 5], ..., [1, 2, 1, ..., 0, 3, 4], [0, 5, 2, ..., 2, 5, 0], [2, 3, 2, ..., 4, 1, 5]])
Create an index out of these using
.reshape(M*N)
Z_index = Z.reshape(M*N) array([1, 3, 0, ..., 4, 1, 5])
This step is where I receive my error. I Use Z_index to add one to each row of NZM that shows up as a value in Z. However, option 1 below is very slow while option 2 has a memory error.
# Option 1 for m in xrange(M): NZM[Z_index,m] += 1 # Option 2 NZM[Z_index,:] += 1 --------------------------------------------------------------------------- MemoryError Traceback (most recent call last) <ipython-input-88-087ab1ede05d> in <module>() 2 # a memory error 3 ----> 4 NZM[Z_index,:] += 1 MemoryError:
I want to add one to a row of this array each time it shows up in the Z_index. Is there a way to do this quickly and efficiently that I am unaware of? Thank you for taking the time to read this.