12

I am trying to accurately/definitively find the size differences between two different classes in Python. They are both new style classes, save for one not having slots defined. I have tried numerous tests to determine their size difference, but they always end up being identical in memory usage.

So far I have tried sys.GetSizeOf(obj) and heapy's heap() function, with no positive results. Test code is below:

import sys
from guppy import hpy

class test3(object):
    def __init__(self):
        self.one = 1
        self.two = "two variable"

class test4(object):
    __slots__ = ('one', 'two')
    def __init__(self):
        self.one = 1
        self.two = "two variable"

test3_obj = test3()
print "Sizeof test3_obj", sys.getsizeof(test3_obj)

test4_obj = test4()
print "Sizeof test4_obj", sys.getsizeof(test4_obj)

arr_test3 = []
arr_test4 = []

for i in range(3000):
    arr_test3.append(test3())
    arr_test4.append(test4())

h = hpy()
print h.heap()

Output:

Sizeof test3_obj 32
Sizeof test4_obj 32

Partition of a set of 34717 objects. Total size = 2589028 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  11896  34   765040  30    765040  30 str
     1   3001   9   420140  16   1185180  46 dict of __main__.test3
     2   5573  16   225240   9   1410420  54 tuple
     3    348   1   167376   6   1577796  61 dict (no owner)
     4   1567   5   106556   4   1684352  65 types.CodeType
     5     68   0   105136   4   1789488  69 dict of module
     6    183   1    97428   4   1886916  73 dict of type
     7   3001   9    96032   4   1982948  77 __main__.test3
     8   3001   9    96032   4   2078980  80 __main__.test4
     9    203   1    90360   3   2169340  84 type
<99 more rows. Type e.g. '_.more' to view.>

This is all with Python 2.6.0. I also attempted to override the class's sizeof methods to try determine the size by summing the individual sizeofs but that didn't yield any different results:

class test4(object):
    __slots__ = ('one', 'two')
    def __init__(self):
        self.one = 1
        self.two = "two variable"
    def __sizeof__(self):
        return super(test4, self).__sizeof__() + self.one.__sizeof__() + self.two.__sizeof__()

Results with the sizeof method overridden:

Sizeof test3_obj 80
Sizeof test4_obj 80
Community
  • 1
  • 1
Zoran Pavlovic
  • 1,166
  • 2
  • 23
  • 38

6 Answers6

5

sys.getsizeof returns a number which is more specialized and less useful than people think. In fact, if you increase the number of attributes to six, your test3_obj remains at 32, but test4_obj jumps to 48 bytes. This is because getsizeof is returning the size of the PyObject structure implementing the type, which for test3_obj doesn't include the dict holding the attributes, but for test4_obj, the attributes aren't stored in a dict, they are stored in slots, so they are accounted for in the size.

But a class defined with __slots__ takes less memory than a class without, precisely because there is no dict to hold the attributes.

Why override __sizeof__? What are you really trying to accomplish?

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • The sizeof override was to see if maybe the builtin sizeof method was not correctly measuring the variables' size. – Zoran Pavlovic Jul 02 '12 at 21:28
  • So what would you suggest is the best way to determine the size differences between such simple-ish objects? – Zoran Pavlovic Jul 02 '12 at 21:31
  • That depends why you want to know the size. What problem are you trying to solve? – Ned Batchelder Jul 02 '12 at 21:35
  • I want to know the size so that I can definitely make a decision on which data structure to pick. And if so, what is the quantitative size difference so that a call can be made as to whether or not the switch justifies the memory benefit. – Zoran Pavlovic Jul 04 '12 at 10:42
  • 1
    You should write your program, with an abstraction to hide the choice, then measure the actual memory footprint of your program each way. That's the only way to know the answer to your real question. `__slots__` is designed to reduce the memory footprint of objects, especially where you have many small objects. – Ned Batchelder Jul 04 '12 at 12:43
5

As others have stated, sys.getsizeof only returns the size of the object structure that represents your data. So if, for instance, you have a dynamic array that you keep adding elements to, sys.getsizeof(my_array) will only ever show the size of the base DynamicArray object, not the growing size of memory that its elements take up.

pympler.asizeof.asizeof() gives an approximate complete size of objects and may be more accurate for you.

from pympler import asizeof
asizeof.asizeof(my_object)  # should give you the full object size
Engineero
  • 12,340
  • 5
  • 53
  • 75
1

First check the size of the Pyton process in your os' memory manager without many objects.

Second make many objects of one kind and check the size again.

Third make many objects of the other kind and check the size.

Repeat this a few times and if the sizes of each step stay about the same you have got something comparable.

Marco de Wit
  • 2,686
  • 18
  • 22
  • 1
    I'm curious as to what sort of accuracy this would get me? Also... I'd need an efficient way to run this multiple times, and then to average it all out. – Zoran Pavlovic Jul 04 '12 at 10:41
1

The following function has been tested in Python 3.6, 64bit system. It has been very useful to me. (I picked it up off the internet and tweaked it to my style, and added the use of 'slots' feature. I am unable to find the original source again.)

def getSize(obj, seen: Optional[Set[int]] = None) -> int:
  """Recursively finds size of objects. Needs: import sys """
  seen = set() if seen is None else seen

  if id(obj) in seen: return 0  # to handle self-referential objects
  seen.add(id(obj))

  size = sys.getsizeof(obj, 0) # pypy3 always returns default (necessary)
  if isinstance(obj, dict):
    size += sum(getSize(v, seen) + getSize(k, seen) for k, v in obj.items())
  elif hasattr(obj, '__dict__'):
    size += getSize(obj.__dict__, seen)
  elif hasattr(obj, '__slots__'): # in case slots are in use
    slotList = [getattr(C, "__slots__", []) for C in obj.__class__.__mro__]
    slotList = [[slot] if isinstance(slot, str) else slot for slot in slotList]
    size += sum(getSize(getattr(obj, a, None), seen) for slot in slotList for a in slot)
  elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
    size += sum(getSize(i, seen) for i in obj)
  return size

Now for the objects of the following classes,

class test3(object):
    def __init__(self):
        self.one = 1
        self.two = "two variable"

class test4(object):
    __slots__ = ('one', 'two')
    def __init__(self):
        self.one = 1
        self.two = "two variable"

the following results are obtained,

In [21]: t3 = test3()

In [22]: getSize(t3)
Out[22]: 361

In [23]: t4 = test4()

In [25]: getSize(t4)
Out[25]: 145

Feedbacks to improve the function are most welcome.

codeman48
  • 1,330
  • 11
  • 17
0

You might want to use a different implementation for getting the size of your objects in memory:

>>> import sys, array
>>> sizeof = lambda obj: sum(map(sys.getsizeof, explore(obj, set())))
>>> def explore(obj, memo):
    loc = id(obj)
    if loc not in memo:
        memo.add(loc)
        yield obj
        if isinstance(obj, memoryview):
            yield from explore(obj.obj, memo)
        elif not isinstance(obj, (range, str, bytes, bytearray, array.array)):
            # Handle instances with slots.
            try:
                slots = obj.__slots__
            except AttributeError:
                pass
            else:
                for name in slots:
                    try:
                        attr = getattr(obj, name)
                    except AttributeError:
                        pass
                    else:
                        yield from explore(attr, memo)
            # Handle instances with dict.
            try:
                attrs = obj.__dict__
            except AttributeError:
                pass
            else:
                yield from explore(attrs, memo)
            # Handle dicts or iterables.
            for name in 'keys', 'values', '__iter__':
                try:
                    attr = getattr(obj, name)
                except AttributeError:
                    pass
                else:
                    for item in attr():
                        yield from explore(item, memo)


>>> class Test1:
    def __init__(self):
        self.one = 1
        self.two = 'two variable'


>>> class Test2:
    __slots__ = 'one', 'two'
    def __init__(self):
        self.one = 1
        self.two = 'two variable'


>>> print('sizeof(Test1()) ==', sizeof(Test1()))
sizeof(Test1()) == 361
>>> print('sizeof(Test2()) ==', sizeof(Test2()))
sizeof(Test2()) == 145
>>> array_test1, array_test2 = [], []
>>> for _ in range(3000):
    array_test1.append(Test1())
    array_test2.append(Test2())


>>> print('sizeof(array_test1) ==', sizeof(array_test1))
sizeof(array_test1) == 530929
>>> print('sizeof(array_test2) ==', sizeof(array_test2))
sizeof(array_test2) == 194825
>>> 

Just make sure that you do not give any infinite iterators to this code if you want an answer back.

Noctis Skytower
  • 21,433
  • 16
  • 79
  • 117
  • "Yield from" Isn't that python3-specific syntax? – Zoran Pavlovic Sep 03 '13 at 15:19
  • Yes, for when the rest of the code might be run through `2to3.py`. Porting to where `yield from` is not available should be fairly easy. – Noctis Skytower Sep 03 '13 at 20:09
  • str should not be iterated over to check their one-char substrings sizes, I proposed an edit that takes this into account. – Adirio Jan 19 '18 at 10:02
  • @Adirio There are several built-ins that really need to be considered if you want to fix this code properly. Two more examples come to mind: `bytes` and `bytearray`. Speaking of arrays, `array.array` should not be explored either. How many more exceptions can you think of? – Noctis Skytower Jan 19 '18 at 13:00
  • You are right, there are multiple exceptions, but the `str` one is relevant as you have them as keys in `Test1` and as one of the two vars. Basically you are adding 26 extra bytes for each char in these strings. Add a `print` statement before every `yield` and execute `sizeof('Hello')` and you will get `Hello` `H` `e` `l` `o`. That makes your results way off. – Adirio Jan 19 '18 at 14:00
  • @Adirio Thanks for your help! There are several types that are not explored as before. – Noctis Skytower Jan 19 '18 at 15:02
0

I ran into a similar problem and ended up writing my own helper to do the dirty work. Check it out here

Community
  • 1
  • 1
wissam
  • 148
  • 1
  • 5