I'm asking this because I found it surprising -- I thought a namedtuple
would have more overhead.
(The background is I was caching a large Django query in memory and found Django objects to be 100x the size of .values()
. I then wondered what overhead namedtuple
versions of the objects would be, allowing me to still use .
access to the items as attributes. Smaller was not what I expected.)
#!/usr/bin/env python
from pympler.asizeof import asizeof
from collections import namedtuple
import random
import string
QTY = 100000
class Foz(object):
pass
dicts = [{'foo': random.randint(0, 10000),
'bar': ''.join([random.choice(string.ascii_letters + string.digits) for n in xrange(32)]),
'baz': random.randrange(10000),
'faz': random.choice([True, False]),
'foz': Foz()} for _ in range(QTY)]
print "%d dicts: %d" % (len(dicts), asizeof(dicts))
# https://stackoverflow.com/questions/43921240/pythonic-way-to-convert-dictionary-to-namedtuple-or-another-hashable-dict-like
MyTuple = namedtuple('MyTuple', sorted(dicts[0]))
tuples = [MyTuple(**d) for d in dicts]
print "%d namedtuples: %d" % (len(tuples), asizeof(tuples))
print "Ratio: %.01f" % (float(asizeof(tuples)) / float(asizeof(dicts)))
Running,
$ ./foo.py
100000 dicts: 75107672
100000 namedtuples: 56707472
Ratio: 0.8
A single tuple is even less, perhaps due to the overhead of list
:
$ ./foo.py
1 dicts: 1072
1 namedtuples: 688
Ratio: 0.6
Is it the overhead of the hashtable array? But wouldn't a namedtuple
also need a hashtable of the attributes? Is pympler
not being accurate?