6

After watching Nina Zahkarenko's Python Memory Management talk at Pycon2016 (link), it seemed like the dunder method __slots__ was a tool to reduce object size and speed up attribute lookup.

My expectation was that a normal class would be the largest, while a __slots__/namedtuple approach would save space. However, a quick experiment with sys.getsizeof() seems to suggest otherwise:

from collections import namedtuple
from sys import getsizeof

class Rectangle:
   '''A class based Rectangle, with a full __dict__'''
   def __init__(self, x, y, width, height):
      self.x = x
      self.y = y
      self.width = width
      self.height = height

class SlotsRectangle:
   '''A class based Rectangle with __slots__ defined for attributes'''
   __slots__ = ('x', 'y', 'width', 'height')

   def __init__(self, x, y, width, height):
      self.x = x
      self.y = y
      self.width = width
      self.height = height

NamedTupleRectangle = namedtuple('Rectangle', ('x', 'y', 'width', 'height'))
NamedTupleRectangle.__doc__ = 'A rectangle as an immutable namedtuple'

print(f'Class: {getsizeof(Rectangle(1,2,3,4))}')
print(f'Slots: {getsizeof(SlotsRectangle(1,2,3,4))}')
print(f'Named Tuple: {getsizeof(NamedTupleRectangle(1,2,3,4))}')

Terminal Output:

$ python3.7 example.py
Class: 56
Slots: 72
Named Tuple: 80

What is going on here? From the docs on Python's Data Model it appears that descriptors are used for __slots__ which would add function overhead to classes implementing it. However, why are the results so heavily skewed towards a normal class?

Channeling my inner Raymond H.: there has to be a harder way!

smci
  • 32,567
  • 20
  • 113
  • 146
Matt
  • 83
  • 1
  • 6
  • 2
    `getsizeof` isn't reporting the size of data structures referenced by the class object; only the class object itself. – chepner Mar 18 '19 at 12:33

3 Answers3

2

The function sys.getsizeof() is probably not doing what you think it does; it does not work for complex objects, like custom classes.

Look at this answer for a method to calculate the memory size of objects; maybe it helps you. I copied the code from that answer in here, but the full explanation is in the answer I linked.

import sys
from numbers import Number
from collections import Set, Mapping, deque

try: # Python 2
    zero_depth_bases = (basestring, Number, xrange, bytearray)
    iteritems = 'iteritems'
except NameError: # Python 3
    zero_depth_bases = (str, bytes, Number, range, bytearray)
    iteritems = 'items'

def getsize(obj_0):
    """Recursively iterate to sum size of object & members."""
    _seen_ids = set()
    def inner(obj):
        obj_id = id(obj)
        if obj_id in _seen_ids:
            return 0
        _seen_ids.add(obj_id)
        size = sys.getsizeof(obj)
        if isinstance(obj, zero_depth_bases):
            pass # bypass remaining control flow and return
        elif isinstance(obj, (tuple, list, Set, deque)):
            size += sum(inner(i) for i in obj)
        elif isinstance(obj, Mapping) or hasattr(obj, iteritems):
            size += sum(inner(k) + inner(v) for k, v in getattr(obj, iteritems)())
        # Check for custom object instances - may subclass above too
        if hasattr(obj, '__dict__'):
            size += inner(vars(obj))
        if hasattr(obj, '__slots__'): # can have __slots__ with __dict__
            size += sum(inner(getattr(obj, s)) for s in obj.__slots__ if hasattr(obj, s))
        return size
    return inner(obj_0)
Ralf
  • 16,086
  • 4
  • 44
  • 68
  • Thanks for the answer @Ralf, I believe `sys.getsizeof()` is fulfilling my use case though. My rectangle object contains four variables which can refer to objects elsewhere on Python's managed heap. The memory footprint of `x` is irrelevant to me, it could be a `list[]` with thousands of elements. It was my understanding that the attribute `x` was only a reference. What I expected to see was reduced memory overhead when using __slots__ compared to a class without. – Matt Mar 18 '19 at 15:49
2

There is more compact variant with recordclass library:

from recordclass import dataobject

class Rectangle(dataobject):
   x:int
   y:int
   width:int
   height:int

>>> r = Rectangle(1,2,3,4)
>>> print(sys.getsizeof(r))
48

It has less memory footprint than __slots__-based one because it doesn't participate in cyclic garbage collection (Py_TPFLAGS_HAVE_GC flag doesn't set, so PyGC_Head (24 bytes [<3.8] and 16 bytes [>=3.8]) doesn't need at all).

intellimath
  • 2,396
  • 1
  • 12
  • 13
1

"Channeling my inner Raymond H," +1

So the thing about slots is, you have to read about slots.

The other thing is, they do affect class size:

print(f'(Class) Class: {getsizeof(Rectangle)}') # 1056
print(f'(Class) Slots: {getsizeof(SlotsRectangle)}') # 888

Cool. Now let's say we add a field to the Rectangle class:

rect = Rectangle(1,2,3,4)
rect.extra_field = dict() # wild right?
print(f'(Object) Class: {getsizeof(rect)}') # still 56

So you can "count" the resources "your using" (in the form of instance variables) and the slots rectangle would be 112 and the non-slots rectangle would be 112 as well...

However, we know this to not be the case as we would expect the regular rectangle to be at least 352 because we added a dict to it.

Slots prevent you from being able to do this and thus provide a way of constraining resource usage.

Check out this answer here, it seems like it might work fairly well for you use case. Running it on the slots rectangle and regular rectangle yields 152 and 352 respectively.

Also, if you're really into trying to optimize your code and minimize resource use come on over to the rust/c/c++ side of the house.

eric
  • 1,029
  • 1
  • 8
  • 9