0

I have now dealt with numpy array in more detail. You always read that numpy ndarray use less memory, but if you look at the total memory consumption, the ndarray is much larger than the list.

in lists we have int objects that are 28 bytes in size, but in numpy array we have numpy.int64 objects that are 32 bytes in size.

So i just don't understand why they say that numpy objects use less memory, because the numpy.int64 objects are four bytes larger than the int objects.

import numpy as np

from sys import getsizeof

def is_iterable(p_object):
    try:
        iter(p_object)
    except TypeError: 
        return False
    return True

def get_total_size(element, size):
  if not is_iterable(element):
    return size + getsizeof(element)
  size = size + getsizeof(element)
  for new_element in element:
    size = get_total_size(new_element, size)
  return size


if __name__ == "__main__":
  x_list = list(range(100))
  x_array = np.array(x_list)


  print("x_list:")
  print("A list with object references consumes in memory " + str(getsizeof(x_list)) + " Byte(s)")
  print("A list of object references and all objects consumed in memory " + str(get_total_size(x_list, 0)) + " Byte(s)")

  print("")

  print("Numpy-Array:")
  print("A ndarray object references consumes in memory " + str(getsizeof(x_array)) + " Byte(s)")
  print("A ndarray of object references and all objects consumed in memory  " + str(get_total_size(x_array, 0)) + " Byte(s)")

print("")
print("objecttype", type(x_array[1]), "size in bytes", getsizeof(x_array[1]), )
print("objecttype", type(x_list[1]), "size in bytes", getsizeof(x_list[1]), )

output:

x_list:
A list with object references consumes in memory 1016 Byte(s)
A list of object references and all objects consumed in memory 3812 Byte(s)

Numpy-Array:
A ndarray object references consumes in memory 896 Byte(s)
A ndarray of object references and all objects consumed in memory  4096 Byte(s)

objecttype <class 'numpy.int64'> size in bytes 32
objecttype <class 'int'> size in bytes 28
Rubs
  • 3
  • 2
  • Does this answer your question? [Python3 numpy array size compare to list](https://stackoverflow.com/questions/63095126/python3-numpy-array-size-compare-to-list) – Julien Dec 09 '21 at 15:29
  • Your array is not object dtype, so adding the 'references' isn't needed. Your list handling is ok (better than most), but you might also want to check floats or larger ints. But memory use isn't numpy's main advantage; computational speed is (if done right). – hpaulj Dec 09 '21 at 15:46
  • Julien's reference is probably the answer on you issue, but anyway you can do `x_array = np.array(x_list, dtype=np.int32)`. – Askold Ilvento Dec 09 '21 at 15:52
  • You aren't correctly getting the memory consumption fo the objects. Importantly, `objecttype size in bytes 32` is irrelevant. `numpy.ndarray` objects are essentially object-oriented wrappers over primitive arrays. To get the size of the underlying buffer, you just want `x_array.nbytes` – juanpa.arrivillaga Dec 09 '21 at 18:09

1 Answers1

1
In [144]: alist = list(range(100))
In [145]: getsizeof(alist)
Out[145]: 856

Most getsizeof questions just use this base number, ignoring the references.

In [146]: get_total_size(alist,0)
Out[146]: 3652

size of individual integers can vary:

In [148]: getsizeof(50)
Out[148]: 28
In [149]: getsizeof(220000000000000000)
Out[149]: 32

100*28+856= 3656 close enough. Integers less than 256 are pre-allocated, so your list doesn't add those to the total memory use. But that's a minor detail.

For an array, with numeric dtype, we don't need to check the non-existent "references"

In [152]: arr = np.array(alist)
In [153]: getsizeof(arr)
Out[153]: 904
In [154]: arr.nbytes
Out[154]: 800

There are 800 bytes in its data-buffer, and about 100 for 'overhead'. That's 100*8, 8 bytes per int64 number. Other dtypes may have different element sizes.

For object dtype arrays, adding the references matters:

In [155]: arr = np.array(alist,object)
In [156]: getsizeof(arr)
Out[156]: 904
In [158]: get_total_size(arr,0)
Out[158]: 3700     # 2800+900

This array references the same ints as alist.

Your get_total_size on the numeric dtype array finds that

In [164]: getsizeof(np.int64(50))
Out[164]: 32

but the array does not "store" 100 of those. That 32 is the 8 bytes for its value, and 24 of overhead. That's the "un-boxed" object, not the stored value.

hpaulj
  • 221,503
  • 14
  • 230
  • 353