71

I was looking for a easy way to know bytes size of arrays and dictionaries object, like

[ [1,2,3], [4,5,6] ] or { 1:{2:2} }

Many topics say to use pylab, for example:

from pylab import *

A = array( [ [1,2,3], [4,5,6] ] )
A.nbytes
24

But, what about dictionaries? I saw lot of answers proposing to use pysize or heapy. An easy answer is given by Torsten Marek in this link: Which Python memory profiler is recommended?, but I haven't a clear interpretation about the output because the number of bytes didn't match.

Pysize seems to be more complicated and I haven't a clear idea about how to use it yet.

Given the simplicity of size calculation that I want to perform (no classes nor complex structures), any idea about a easy way to get a approximate estimation of memory usage of this kind of objects?

Kind regards.

Community
  • 1
  • 1
crandrades
  • 705
  • 1
  • 5
  • 5

7 Answers7

66

There's:

>>> import sys
>>> sys.getsizeof([1,2, 3])
96
>>> a = []
>>> sys.getsizeof(a)
72
>>> a = [1]
>>> sys.getsizeof(a)
80

But I wouldn't say it's that reliable, as Python has overhead for each object, and there are objects that contain nothing but references to other objects, so it's not quite the same as in C and other languages.

Have a read of the docs on sys.getsizeof and go from there I guess.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • 2
    I tried that way, but when you try to get size of a list of lists, you get only the parent list size and not the total with the nested lists. I don't know if I write code to do the recursion, I'll get the real memory usage. – crandrades Nov 23 '12 at 14:57
  • 5
    @user1847706 at the end of the entry I linked you to in the docs, there's [See recursive sizeof recipe for an example of using getsizeof() recursively to find the size of containers and all their contents.](http://code.activestate.com/recipes/577504/) – Jon Clements Nov 23 '12 at 15:01
  • Thanks for your answer. Now, I'm trying to add a handler to calculate memory usage for a user defined class. – crandrades Nov 23 '12 at 23:51
  • Unfurtentlly, this answer is wrong. It only accounts for the root object size. If the list were to have internal objects (as in OP's example), it will report the wrong object's in-memory size. – Liran Funaro Jul 23 '20 at 07:35
63

None of the answers here are truly generic.

The following solution will work with any type of object recursively, without the need for an expensive recursive implementation:

import gc
import sys

def get_obj_size(obj):
    marked = {id(obj)}
    obj_q = [obj]
    sz = 0

    while obj_q:
        sz += sum(map(sys.getsizeof, obj_q))

        # Lookup all the object referred to by the object in obj_q.
        # See: https://docs.python.org/3.7/library/gc.html#gc.get_referents
        all_refr = ((id(o), o) for o in gc.get_referents(*obj_q))

        # Filter object that are already marked.
        # Using dict notation will prevent repeated objects.
        new_refr = {o_id: o for o_id, o in all_refr if o_id not in marked and not isinstance(o, type)}

        # The new obj_q will be the ones that were not marked,
        # and we will update marked with their ids so we will
        # not traverse them again.
        obj_q = new_refr.values()
        marked.update(new_refr.keys())

    return sz

For example:

>>> import numpy as np
>>> x = np.random.rand(1024).astype(np.float64)
>>> y = np.random.rand(1024).astype(np.float64)
>>> a = {'x': x, 'y': y}
>>> get_obj_size(a)
16816

See my repository for more information, or simply install my package (objsize):

$ pip install objsize

Then:

>>> from objsize import get_deep_size
>>> get_deep_size(a)
16816
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Liran Funaro
  • 2,750
  • 2
  • 22
  • 33
  • 3
    This answer definitely needs more attention! Clean way to calculate the memory usage. Thanks. – Qinsheng Zhang Jul 22 '20 at 14:09
  • 1
    Just a warning to those who try this with PyTorch: You will need to access the storage attribute on each tensor to get the correct size. See; https://stackoverflow.com/questions/54361763/pytorch-why-is-the-memory-occupied-by-the-tensor-variable-so-small – Jamie Jul 06 '21 at 15:26
  • @Jamie Did you verified that this solution does not work in the linked example? I don't see why this solution wouldn't recurse to access the storage of the PyTorch object. – Liran Funaro Jul 06 '21 at 15:57
  • Yes, confirmed. `get_obj_size(torch.rand(200, 200)) == get_obj_size(torch.rand(200))` is `True`. (Both return 64 bytes.) – Jamie Jul 07 '21 at 16:12
  • 1
    @Jamie The package now supports `torch` by allowing the user to add a specific handler per object type. See "Special Objects" in the package [docs](https://pypi.org/project/objsize/). – Liran Funaro Jul 10 '22 at 09:04
  • `get_deep_size` is pretty accurate – Rick Jul 14 '22 at 02:40
31

a bit late to the party but an easy way to get size of dict is to pickle it first.

Using sys.getsizeof on python object (including dictionary) may not be exact since it does not count referenced objects.

The way to handle it is to serialize it into a string and use sys.getsizeof on the string. Result will be much closer to what you want.

import cPickle

mydict = {'key1':'some long string, 'key2':[some, list], 'key3': whatever other data}

doing sys.getsizeof(mydict) is not exact so, pickle it first

mydict_as_string = cPickle.dumps(mydict)

now we can know how much space it takes by

print sys.getsizeof(mydict_as_string)
Denis Kanygin
  • 1,125
  • 12
  • 15
  • 11
    This won’t tell you the size of the dict; it will tell you the size of the pickle representation of the dict, which will be larger (potentially by a considerable amount) than the in-memory size of the dict. – jbg Feb 11 '14 at 23:58
  • 1
    @JasperBryant-Greene that's the point. Using sys.getsizeof on python object (including dictionary) may not be exact since it does not count referenced objects. Serializing it and then getting size is not exact but will be closer to what you want. Think of of it as an approximation. – Denis Kanygin Feb 19 '14 at 17:34
  • Sure, but the question asks for "a approximate estimation of memory usage of this kind of objects". I think this doesn't even qualify as an approximate estimation of memory usage -- the pickled size will typically be much larger. – jbg Mar 04 '14 at 09:36
  • 3
    This can be a very rough approximation since it almost completely ignores the overhead of the structure. For example the size of an empty dict is 280 on my machine, while the size of the dict pickled to a string is 43. The less bulky the data stored, the rougher the approximation is. – CoatedMoose Apr 25 '14 at 00:10
  • 4
    This doesn't seem any better than `print len(json.dumps(my_dict))` – MarkHu Apr 14 '16 at 21:46
  • What about pickling and then de-pickling? – std''OrgnlDave Feb 01 '17 at 02:33
  • If found that tuples/namedtuples are much smaller in memory than pickled, with the reverse true for dicts. – technomage Nov 07 '18 at 17:04
  • This is the best answer if you just need an estimate. json is slow as are the other options. pickle is about 7x faster than any other ways of getting the size of an object in my tests. – vangheem Nov 25 '19 at 18:30
  • 1
    Unfurtentlly, this answer is wrong. It calculates the serialized size of the objects. This have nothing to do with the in-memory representation size of the object. In most cases, this would be significantly larger due to the robust encoding of the pickle mechanism. – Liran Funaro Jul 23 '20 at 07:38
13

Use this recipe , taken from here:

http://code.activestate.com/recipes/577504-compute-memory-footprint-of-an-object-and-its-cont/

from __future__ import print_function
from sys import getsizeof, stderr
from itertools import chain
from collections import deque
try:
    from reprlib import repr
except ImportError:
    pass

def total_size(o, handlers={}, verbose=False):
    """ Returns the approximate memory footprint an object and all of its contents.

    Automatically finds the contents of the following builtin containers and
    their subclasses:  tuple, list, deque, dict, set and frozenset.
    To search other containers, add handlers to iterate over their contents:

        handlers = {SomeContainerClass: iter,
                    OtherContainerClass: OtherContainerClass.get_elements}

    """
    dict_handler = lambda d: chain.from_iterable(d.items())
    all_handlers = {tuple: iter,
                    list: iter,
                    deque: iter,
                    dict: dict_handler,
                    set: iter,
                    frozenset: iter,
                   }
    all_handlers.update(handlers)     # user handlers take precedence
    seen = set()                      # track which object id's have already been seen
    default_size = getsizeof(0)       # estimate sizeof object without __sizeof__

    def sizeof(o):
        if id(o) in seen:       # do not double count the same object
            return 0
        seen.add(id(o))
        s = getsizeof(o, default_size)

        if verbose:
            print(s, type(o), repr(o), file=stderr)

        for typ, handler in all_handlers.items():
            if isinstance(o, typ):
                s += sum(map(sizeof, handler(o)))
                break
        return s

    return sizeof(o)


##### Example call #####

if __name__ == '__main__':
    d = dict(a=1, b=2, c=3, d=[4,5,6,7], e='a string of chars')
    print(total_size(d, verbose=True))
Oren
  • 4,711
  • 4
  • 37
  • 63
3

In case you want to measure the size of the body that you will send via eg. HTTP as JSON, could you convert it to str first and then count its length? After all you will send it as text. So this:

>>> import json
>>> import sys

>>> my_dict = {"var1": 12345, "var2": "abcde", "var3": 23.43232, "var4": True, "var5": None}
>>> a = json.dumps(my_dict)

>>> len(a)
78
>>> sys.getsizeof(my_dict)
232
>>> sys.getsizeof(a)
127

The total number of characters in the converted object is 78, so in computers where 1 character = 1 byte, then 78 bytes would be reasonable answer and seems more accurate than using sys.getsizeof.

babis21
  • 1,515
  • 1
  • 16
  • 29
2

I just learned from another question that the module pympler is much better suited for self-created objects than sys.getsizeof. Just use it as follows:

from pympler import asizeof
asizeof.asizeof(my_object)
Hagbard
  • 3,430
  • 5
  • 28
  • 64
2

I was searching for a method that will return the same size as a saved file of that exact multi-dimensional array (56x56x128). I finally used this method and it gave me the same memory size used by the file:

import numpy as np
my_list = np.random.rand(56,56,128)
print(my_list.nbytes) #/1000 => KB, /1000000 => MB and /1000000000 => GB
np.save("my_list.npy",my_list) # my_list.npy size is: 3.2 MB
Amine Sehaba
  • 122
  • 6
  • 2
    The [method `numpy.ndarray.nbytes`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.nbytes.html) returns the number of bytes consumed by the elements of an array, without the memory consumed by other attributes of the `numpy` array. For this reason, the value of the attribute `nbytes` is slightly smaller than the value returned by `sys.getsizeof`. – 0 _ Jun 24 '21 at 02:41