Huge symmetric matrix - how to store and use it cleverly - Python

Question

I have a symmetric matrix. Now,the problem is that I need to fill such a matrix of dimensions (32**3) x (32**3). The reason why I need to fill the matrix is because in my program I am then using it for various calculations: I am inverting it, I am multiplying it with other matrices...and it seems to me that in order to perform these various calculations you do need to actually store the full matrix, you can't use for example only half of that (but I may be wrong, in which case please tell me how I should do).

The problem is that such a matrix is simply too big for my computer and I get the following error:

Traceback (most recent call last):
    File "program.py", line 191, in <module>
        A = zeros((n_x*n_y*n_z. n_x*n_y*n_z), float)
MemoryError

Here, n_x = 32. So, how can I solve this problem? Is there some way to store such a big matrix, or a clever way to avoid storing it? Both ways would be fine for me, provided I can use them without making mistakes in calculations.

For completeness, I report in the following how the A matrix is built:

    n_x = n_y = n_z = 32
    L_x = L_y = L_z = n_x
    A = zeros((n_x*n_y*n_z , n_x*n_y*n_z), float)
    P_0 = 50.0
    sigma_x = sigma_y = sigma_z = 0.9
    sigma_root = np.sqrt(sigma_x**2 + sigma_y**2 + sigma_z**2)
    twosigmasquared = 2.*sigma_root**2
    for l in range(n_x*n_y*n_z):
        for m in range(n_x*n_y*n_z):
            A[l][m] = P_0*(L_x/(np.sqrt(2.*np.pi)*sigma_root*n_x**2)) * (L_y/(np.sqrt(2.*np.pi)*sigma_root*n_y**2)) * (L_z/(np.sqrt(2.*np.pi)*sigma_root*n_z**2))*np.exp((-((x[l]-x[m])**2)-((y[l]-y[m])**2)-((z[l]-z[m])**2))/twosigmasquared)
            A[m][l] = A[l][m]

One obvious solution is to get a bigger machine. This matrix has `~ 10^9` elements and should fit easily in memory on a machine which has `~ 16Gb` of RAM. Such machines are available on AWS EC2. The other solution is to calculate elements on the fly: `def A(i,j): return ...`. — musically_ut, Aug 11 '15 at 11:51
Depending on how `numpy` stores matrices this could consume quite a lot of memory. In it's most compact form (using the symmetry) it would consume about 4Gb memory, otherwise it would be 8Gb of memory or even 16Gb of memory. You'll need to handle that if you cannot cope with not having the entire matrix in memory at the same time. — skyking, Aug 11 '15 at 11:55
another good question, don't forget accept the answer that helps you more! — developer_hatch, Oct 26 '17 at 13:43

score 1 · Answer 1 · answered Aug 12 '15 at 14:53

To save 50% space, you can use a lil_sparse matrix from scipy.

from scipy import sparse as S
A = S.lil_matrix((n_x*n_y*n_z , n_x*n_y*n_z), float)

n_x = n_y = n_z = 32
L_x = L_y = L_z = n_x
P_0 = 50.0
sigma_x = sigma_y = sigma_z = 0.9
sigma_root = np.sqrt(sigma_x**2 + sigma_y**2 + sigma_z**2)
twosigmasquared = 2.*sigma_root**2
for l in range(n_x*n_y*n_z):
    for m in range(l, n_x*n_y*n_z): # Filling only the top half
        A[l][m] = P_0*(L_x/(np.sqrt(2.*np.pi)*sigma_root*n_x**2)) * (L_y/(np.sqrt(2.*np.pi)*sigma_root*n_y**2)) * (L_z/(np.sqrt(2.*np.pi)*sigma_root*n_z**2))*np.exp((-((x[l]-x[m])**2)-((y[l]-y[m])**2)-((z[l]-z[m])**2))/twosigmasquared)

Then instead of accessing the matrix itself, you can write a helper function:

def getA(i, j):
    if i < j:
        return A[j, i]
    else:
        return A[i, j]

However, this is not going to help you much if you want calculate the inverse of the matrix using standard approaches, want to multiply the matrix efficiently or do any operations at all.

Saving the whole matrix in memory is probably a better choice.

Huge symmetric matrix - how to store and use it cleverly - Python

1 Answers1