0

I have a .npz file which I want to load into RAM . The compressed file size is 30MB . I am doing the following operation to load the data into RAM.

import numpy as np
from scipy import sparse
from sys import getsizeof

a = sparse.load_npz('compressed/CRS.npz').todense()
getsizeof(a)
# 136
type(a)
# numpy.matrixlib.defmatrix.matrix
b = np.array(a)
getsizeof(b)
# 64000112
type(b)
# numpy.ndarray

Why numpy.matrix object occupy very low memory size compared to numpy.arrray ? Both a and b have same dimension and data.

sophros
  • 14,672
  • 11
  • 46
  • 75
  • Possible duplicate of [Python memory usage of numpy arrays](https://stackoverflow.com/questions/11784329/python-memory-usage-of-numpy-arrays) – user2699 Oct 19 '18 at 18:22
  • If you'd used `.toarray()` you'd have gotten the full size. `.todense` adds a `asmatrix` layer on top of that, creating a view. That is an implementation detail. In general `getsizeof` is not a reliable measure. It sort of works with arrays, but is worthless with lists. – hpaulj Oct 19 '18 at 18:30

1 Answers1

3

Your a matrix is a view of another array, so the underlying data is not counted towards its getsizeof. You can see this by checking that a.base is not None, or by seeing that the OWNDATA flag is False in a.flags.

Your b array is not a view, so the underlying data is counted towards its getsizeof.

numpy.matrix doesn't provide any memory savings.

user2357112
  • 260,549
  • 28
  • 431
  • 505