It appears to me that each instance of a particular class has its own dictionary. This could waste a lot of space when there is a large number of identically structured class objects. Is this actually the case, or is the underlying mechanism more efficient, only creating an object's dictionary when it is explicitly asked for. I am considering an application where I may have a very large number, possibly into millions, of objects, should I avoid using a class and instead use a sequence with a named constant as the index?
-
1Millions is not a very large number, but you should check out `__slots__`. For example here: http://stackoverflow.com/questions/472000/usage-of-slots – Paul Hankin Feb 18 '17 at 14:05
-
@Paul Hankin That should have been an answer, then I could have upvoted it. It is precisely the answer I was looking for. – Chris Barry Feb 18 '17 at 14:16
-
It has the additional benefit that only elements named in the __slots__ variable can be accessed, so typing errors are detected sooner. This really should be much more prominent in the Python documentation. – Chris Barry Feb 18 '17 at 14:33
2 Answers
If you want to reduce the overhead you have two options depending on what you actually need.
If you need a class-like structure then you should consider using __slots__
. This will avoid the __dict__
but still allows you to have methods, properties and so on. You'll lose the ability to dynamically add attributes (you're restricted to those listed as __slots__
).
If you just want a "storage" for objects and don't need methods and similar you can use collections.namedtuple
. These provide a "class-like" interface to their items and should be pretty space-efficient.
For example a class that just has two attributes "lastname" and "firstname" could be implemented as:
class Person(object):
__slots__ = ['firstname', 'lastname']
def __init__(self, firstname, lastname):
self.firstname = firstname
self.lastname = lastname
def __repr__(self):
return '{self.__class__.__name__}({self.firstname!r}, {self.lastname!r})'.format(self=self)
>>> p = Person('Tom', 'Riddle')
>>> p
Person('Tom', 'Riddle')
>>> p.firstname
'Tom'
or as namedtuple:
>>> from collections import namedtuple
>>> Person = namedtuple('Person', 'firstname, lastname')
>>> p = Person('Tom', 'Riddle')
>>> p
Person(firstname='Tom', lastname='Riddle')
>>> p.firstname
'Tom'

- 145,886
- 38
- 333
- 352
-
Being prevented from dynamically adding members is, in my opinion, more often a benefit. I didn't know about namedtuple, and would otherwise have considered it to be a good solution, but __slots__ seems superior. – Chris Barry Feb 18 '17 at 14:40
-
@ChrisBarry Both have their use-cases. I agree that `__slots__` is superior (they allow methods and real properties), but in some cases you just want a "class-like" immutable storage container and then `namedtuple` is a viable alternative. – MSeifert Feb 18 '17 at 14:52
That depends on the data you want to store in each object, but in most cases lists should do.

- 473
- 4
- 9