I have a twofold problem concerning memory computation and sparse matrices for 3D arrays in Python:
I have various sparse representations (of 1s and 0s) of 3D numpy arrays. Meaning an array looks like this:
A =
[[[0. 1. 1.]
[1. 0. 1.]
[1. 0. 0.]
...
[1. 0. 0.]
[1. 0. 0.]
[1. 0. 0.]]
...
[[1. 0. 1.]
[0. 1. 0.]
[1. 0. 1.]
...
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]]
and array.shape is equal to (x, y, 3).
I would like to find a way to (1) measure the array's memory, then (2) store it as a sparse matrix/array (using sth similar to scipy's csr_matrix
), then (3) measure the memory of the sparse matrix/array to (hopefully) see an improvement in memory.
My first problem is that I generally have trouble with python's memory measurement solutions I have found so far, for example, I expected to see a difference in memory size when I take an array of floats of many decimal points (eg B = [[[0.38431373 0.4745098 0.6784314 ] [0.41963135 0.49019608 0.69411767] [0.40392157 0.49019608 0.6862745 ] ...]]])
and an array of 1.s & 0.s of the same size (like array A) which should have shown a big improvement (I need to measure this difference as well). Yet, python reports the same memory size for arrays of the same shape. I am listing the methods I used here and their outputs:
print(sizeof(A)) #prints 3783008
asizeof.asizeof(A) #prints 3783024
print(actualsize(A)) #prints 3783008
print(A.nbytes) #prints 3782880
print(total_size(A)) #prints 3783008
getsize(A) #prints 3783008
print(len(pickle.dumps(A))) #prints 3783042
********************
print(asizeof.asizeof(B)) #prints 5044112
sys.getsizeof(B) #prints 128 !!!
print(sizeof(B)) #prints 128 !!!
print(actualsize(B)) #prints 128 !!!
print(total_size(B)) #prints 128 !!!
print(B.nbytes) #prints 3782880
getsize(B) #prints 128 !!!
print(len(pickle.dumps(B))) #prints 3783042
(methods collected from here, here, and here).
My second problem is that I cannot find an economical way to store a matrix (of a certain sparsity) as a sparse matrix for 3D arrays: Scipy's csr_matrix
and pandas' SparseArray
works for 2D arrays only, and sparse.COO()
is very costly for 3D - it starts to help with memory for sparsities of ~80% and higher. For example, a 70% sparse array stored with sparse.COO()
is about 8M bytes big (e.g. using pickle
), which is much bigger than the actual array. Or maybe the problem is still the way I compute memory (see methods listed in the examples above).
Any ideas of what I should do? I am really sorry this post is too long! Thank you in advance!