0

I am trying to perform memory profiling of list vs numpy arrays.

%%file memory.py

import numpy as np

@profile
def allocate():
    vector_list = [float(i) for i in range(10000)]
    np.arange(0,10000,dtype='d')

allocate()

Running memory profiler in the shell:

!python -m memory_profiler memory.py

gives the following output:

Line #    Mem usage    Increment   Line Contents
================================================
     4   39.945 MiB    0.000 MiB   @profile
     5                             def allocate():
     6   39.949 MiB    **0.004 MiB**       vector_list = [float(i) for i in range(10000)]
     7   40.039 MiB    **0.090 MiB**       np.arange(0,10000,dtype='d')

Increment in memory of line 6 vs line 7 shows that numpy array was way more expensive than a list. What am I doing wrong?

ramailo sathi
  • 357
  • 1
  • 4
  • 22
  • 2
    I cannot reproduce your results... – juanpa.arrivillaga Jul 19 '17 at 22:35
  • Since you're largely interested in those two, you could just check the sizes of both objects using `sys.getsizeof` (which should work reasonably well for a list and a `np.arange` object), instead of relying extensively on a memory profiling tool. – Moses Koledoye Jul 19 '17 at 22:40
  • @MosesKoledoye yeah, but you have to have a grip on CPython internals to use `sys.getsizeof` correctly. For example, you would need `sum(map(sys.getsizeof, vector_list)) + sys.getsizeof(vector_list)` to get an accurate picture of the memory usage of `vector_list`. And `sys.getsizeof(np.arange(0,10000))` – juanpa.arrivillaga Jul 19 '17 at 22:42
  • @MosesKoledoye in other words, `sys.getsizeof` does *not* work reasonably well, naively, with a `list`. If you did it with `vector_list`, it would be off by about `240000` bytes – juanpa.arrivillaga Jul 19 '17 at 22:44
  • @juanpa.arrivillaga Well, you do have a point, but their memory profiler isn't making matters any better :)) – Moses Koledoye Jul 19 '17 at 22:44
  • The result is 80096 bytes vs 87624 bytes. I was expecting list to be 3 times more expensive. – ramailo sathi Jul 19 '17 at 22:45
  • @ramailosathi That's because *you aren't accounting for the elements in the list, just the list iteself*. See my example, you need to use `sum(map(sys.getsizeof, vector_list)) + sys.getsizeof(vector_list)` – juanpa.arrivillaga Jul 19 '17 at 22:46
  • @MosesKoledoye **nope** that's the *whole point* of `np.ndarray`, it *doesn't contain float objects*. It is an object-oriented wrapper around primitive arrays. It's actual memory usage can be found using `arr.nbytes` (plus a tiny overhead for the Python object). – juanpa.arrivillaga Jul 19 '17 at 22:50
  • list over-allocate memory, and it is a dynamic array of pointers. Each of the pointer points to an integer object. http://www.laurentluce.com/posts/python-list-implementation/ pointers is 8 byte on 64 bit machine. I guess that is 8 byte for pointer plus 8 byte for integer if the integer are about 64 bit. Integer out of the range of 64 bit use more memory. numpy array is book keeping plus just 8 byte per np.64 bit integer. So numpy array should be more memory efficient. By the way, python lists are not really list. it is dynamic array. – hamster on wheels Jul 19 '17 at 22:50
  • @juanpa.arrivillaga Good one. Thanks for clearing that up. – Moses Koledoye Jul 19 '17 at 22:51
  • 1
    @MosesKoledoye yep. Check out [my answer here](https://stackoverflow.com/a/43578969/5014455), although, the original question was about the memory usage of a bunch of `dict`s. But I go into how `numpy` can be *extremely* memory efficient, but it also demonstrates the subtleties of getting the actual memory usage of a Python container. E.g. string interning, small-int caching, etc. – juanpa.arrivillaga Jul 19 '17 at 22:54
  • And I guess I'll just [leave this here](https://stackoverflow.com/questions/43404210/how-much-memory-will-a-list-with-one-million-elements-take-up-in-python/43404344#43404344) as well, an answer I wrote up that discusses how to reason about Python `list`s memory, it also contains a link to a fairly decent recipe for figuring out the memory usage of a container, as long as you are working with built-in objects it should work like a charm. It takes into account object `id`s so you dont count things twice! – juanpa.arrivillaga Jul 19 '17 at 23:01

1 Answers1

3

I do not know what is memory profiler reporting - I get very different numbers from you:

Line #    Mem usage    Increment   Line Contents
================================================
     3   41.477 MiB    0.000 MiB   @profile
     4                             def allocate():
     5   41.988 MiB    0.512 MiB       vector_list = [float(i) for i in range(10000)]
     6   41.996 MiB    0.008 MiB       np.arange(0,10000,dtype='d')

I would recommend the following two links for your reading: Python memory usage of numpy arrays and Size of list in memory

I have also modified your code as follows:

import numpy as np
import sys

@profile
def allocate():
    vector_list = [float(i) for i in range(10000)]
    npvect = np.arange(0,10000,dtype='d')
    listsz = sum(map(sys.getsizeof, vector_list)) + sys.getsizeof(vector_list)
    print("numpy array size: {}\nlist size: {}".format(npvect.nbytes, listsz)) 
    print("getsizeof(numpy array): {}\n".format(sys.getsizeof(npvect))) 

allocate()

and it outputs:

numpy array size: 80000
list size: 327632
getsizeof(numpy array): 80096
AGN Gazer
  • 8,025
  • 2
  • 27
  • 45