0

I'm using numpy.array to process a big matrix (about 20000*20000), I have a question in exploring the memory usage of arrays.

>>> a = np.random.random((5,5))
>>> np.savetxt(fname = path, X = a)
>>> b = np.loadtxt(fname = path)
>>> b
array([[0.17940875, 0.33674265, 0.14397669, 0.49947964, 0.70878022],
       [0.88072205, 0.69542991, 0.6094819 , 0.47855311, 0.73319366],
       [0.75855104, 0.79885525, 0.77966685, 0.3756036 , 0.81272082],
       [0.754227  , 0.07242963, 0.16935453, 0.76840836, 0.10537832],
       [0.74316004, 0.76265098, 0.7661815 , 0.22217968, 0.32509482]])
>>> a
array([[0.17940875, 0.33674265, 0.14397669, 0.49947964, 0.70878022],
       [0.88072205, 0.69542991, 0.6094819 , 0.47855311, 0.73319366],
       [0.75855104, 0.79885525, 0.77966685, 0.3756036 , 0.81272082],
       [0.754227  , 0.07242963, 0.16935453, 0.76840836, 0.10537832],
       [0.74316004, 0.76265098, 0.7661815 , 0.22217968, 0.32509482]])
>>> a.__sizeof__()
312
>>> b.__sizeof__()
112
>>> a.dtype
dtype('float64')
>>> b.dtype
dtype('float64')
>>>

So, why the memory of var a is 312 and that of var b is 112?

LGP
  • 19
  • 3
  • I would suggest making `I'm using numpy.array to process a big matrix (about 20000*20000), I have to search ways to save memory.` into another question. You would also have to provide some detail about what *process* means... – Roland Smith Aug 23 '19 at 13:28
  • (Re)read the docs on how numpy arrays are stored. Pay attention to topics such as views versus copies, and shared data buffers (memory). – hpaulj Aug 23 '19 at 15:24

1 Answers1

2

Why the 2 arrays have different memory size?

The difference ist understandable if we look at the base attribute:

>>> a.base
>>> b.base
array([[[ 0.17940875,  0.33674265,  0.14397669,  0.49947964,  0.70878022]],

       [[ 0.88072205,  0.69542991,  0.6094819 ,  0.47855311,  0.73319366]],

       [[ 0.75855104,  0.79885525,  0.77966685,  0.3756036 ,  0.81272082]],

       [[ 0.754227  ,  0.07242963,  0.16935453,  0.76840836,  0.10537832]],

       [[ 0.74316004,  0.76265098,  0.7661815 ,  0.22217968,  0.32509482]]])

See also Internal memory layout of an ndarray:

An instance of class ndarray consists of a contiguous one-dimensional segment of computer memory (owned by the array, or by some other object), combined with an indexing scheme …

So b.__sizeof__() is just the "bookkeeping" information of the ndarray, while the actual array data is stored in b.base. In contrast to this, a.base is None, a contains its actual data within itself and a.__sizeof__() is the sum of the "bookkeeping" and the actual data. The difference 200 is just what we'd expect for 25 'float64' numbers.

In each case, the actual data size is returned by the nbytes attribute.

Armali
  • 18,255
  • 14
  • 57
  • 171