Data efficiency of class objects

Question

It appears to me that each instance of a particular class has its own dictionary. This could waste a lot of space when there is a large number of identically structured class objects. Is this actually the case, or is the underlying mechanism more efficient, only creating an object's dictionary when it is explicitly asked for. I am considering an application where I may have a very large number, possibly into millions, of objects, should I avoid using a class and instead use a sequence with a named constant as the index?

Millions is not a very large number, but you should check out `__slots__`. For example here: http://stackoverflow.com/questions/472000/usage-of-slots — Paul Hankin, Feb 18 '17 at 14:05
@Paul Hankin That should have been an answer, then I could have upvoted it. It is precisely the answer I was looking for. — Chris Barry, Feb 18 '17 at 14:16
It has the additional benefit that only elements named in the __slots__ variable can be accessed, so typing errors are detected sooner. This really should be much more prominent in the Python documentation. — Chris Barry, Feb 18 '17 at 14:33

MSeifert · Accepted Answer · 2017-02-18T14:23:27.767

If you want to reduce the overhead you have two options depending on what you actually need.

If you need a class-like structure then you should consider using __slots__. This will avoid the __dict__ but still allows you to have methods, properties and so on. You'll lose the ability to dynamically add attributes (you're restricted to those listed as __slots__).

If you just want a "storage" for objects and don't need methods and similar you can use collections.namedtuple. These provide a "class-like" interface to their items and should be pretty space-efficient.

For example a class that just has two attributes "lastname" and "firstname" could be implemented as:

class Person(object):
    __slots__ = ['firstname', 'lastname']

    def __init__(self, firstname, lastname):
        self.firstname = firstname
        self.lastname = lastname

    def __repr__(self):
        return '{self.__class__.__name__}({self.firstname!r}, {self.lastname!r})'.format(self=self)

>>> p = Person('Tom', 'Riddle')
>>> p
Person('Tom', 'Riddle')
>>> p.firstname
'Tom'

or as namedtuple:

>>> from collections import namedtuple

>>> Person = namedtuple('Person', 'firstname, lastname')

>>> p = Person('Tom', 'Riddle')
>>> p
Person(firstname='Tom', lastname='Riddle')
>>> p.firstname
'Tom'

Being prevented from dynamically adding members is, in my opinion, more often a benefit. I didn't know about namedtuple, and would otherwise have considered it to be a good solution, but __slots__ seems superior. — Chris Barry, Feb 18 '17 at 14:40
@ChrisBarry Both have their use-cases. I agree that `__slots__` is superior (they allow methods and real properties), but in some cases you just want a "class-like" immutable storage container and then `namedtuple` is a viable alternative. — MSeifert, Feb 18 '17 at 14:52

score -1 · Answer 2 · answered Feb 18 '17 at 13:59

-1

That depends on the data you want to store in each object, but in most cases lists should do.

answered Feb 18 '17 at 13:59

Daniel

473
4
9

Data efficiency of class objects

2 Answers2