Why do namedtuples use less memory than dictionaries?

Question

I'm asking this because I found it surprising -- I thought a namedtuple would have more overhead.

(The background is I was caching a large Django query in memory and found Django objects to be 100x the size of .values(). I then wondered what overhead namedtuple versions of the objects would be, allowing me to still use . access to the items as attributes. Smaller was not what I expected.)

#!/usr/bin/env python                                                           

from pympler.asizeof import asizeof                                             
from collections import namedtuple                                              

import random                                                                   
import string                                                                   

QTY = 100000                                                                    


class Foz(object):                                                              
    pass                                                                        

dicts = [{'foo': random.randint(0, 10000),                                      
          'bar': ''.join([random.choice(string.ascii_letters + string.digits) for n in xrange(32)]),
          'baz': random.randrange(10000),                                       
          'faz': random.choice([True, False]),                                  
          'foz': Foz()} for _ in range(QTY)]                                    

print "%d dicts: %d" % (len(dicts), asizeof(dicts))                             

# https://stackoverflow.com/questions/43921240/pythonic-way-to-convert-dictionary-to-namedtuple-or-another-hashable-dict-like

MyTuple = namedtuple('MyTuple', sorted(dicts[0]))                               

tuples = [MyTuple(**d) for d in dicts]                                          

print "%d namedtuples: %d" % (len(tuples), asizeof(tuples))                     

print "Ratio: %.01f" % (float(asizeof(tuples)) / float(asizeof(dicts)))

Running,

$ ./foo.py    
100000 dicts: 75107672
100000 namedtuples: 56707472
Ratio: 0.8

A single tuple is even less, perhaps due to the overhead of list:

$ ./foo.py    
1 dicts: 1072
1 namedtuples: 688
Ratio: 0.6

Is it the overhead of the hashtable array? But wouldn't a namedtuple also need a hashtable of the attributes? Is pympler not being accurate?

@JaredSmith I'd expect private code (even if counted by `asizeof`) to be averaged out over large numbers of items. — rrauenza, May 17 '19 at 20:53
[`__slots__`](https://docs.python.org/3/reference/datamodel.html#slots) — wim, May 17 '19 at 21:33
No, a `namedtuple` does not need a hashtable. Its is a tuple. It is *specifically for saving memory*. That is it's use-case, and to make working with tuples more readable / self-documenting. — juanpa.arrivillaga, May 17 '19 at 21:38

user2722968 · Answer 1 · 2019-05-17T21:41:08.993

9

The basic answer is simply "yes": A normal object has an internal dictionary to store the instance's attributes:

class Foo:
    pass

f = Foo()
print(f.__dict__)
# {}

It needs to be a dict because in Python you are allowed to assign new attributes on an instance that were not defined by the class:

f.a = 1
print(f.__dict__)
# {'a': 1}

Using a dict allows fast attribute lookups, but there is memory overhead due to the data structure itself. Also, because different instances of Foo may have different attributes defined on them, every single instance might need its own dict:

g = Foo()
print(g.__dict__)
# {}
print(f.__dict_ == g.__dict__)
# False

A namedtuple does not allow adding attributes at runtime. A specific instance of namedtuple can, therefore, store all of its attributes in a single instance being shared by all instances.

Given a namedtuple and an instance:

Foo = collections.namedtuple("Foo", 'a,b')
f = Foo(1,2)

The namedtuple-constructor generates a descriptor for each field and stores it in the class; here is where the translation between named attribute and tuple index is stored. When you access attribute a on instance f, the attribute access is routed through this descriptor:

type(Foo.a)
#<class 'property'>

edited May 17 '19 at 21:41

answered May 17 '19 at 20:56

user2722968

13,636
2
46
67

Where are the keys stored in a `namedtuple` for accessing attributes by name? i.e., `.foo` – rrauenza May 17 '19 at 21:05
3

In the class, not the instance – Mad Physicist May 17 '19 at 21:34
Updated the answer to make clear that the memory savings come from sharing the attribute dictionary in the class – user2722968 May 17 '19 at 21:41
@user2722968 I think showing some of this would improve your already great answer: https://stackoverflow.com/questions/17916853/how-named-tuples-are-implemented-internally-in-python – rrauenza May 17 '19 at 21:45
@user2722968 But yes! The name mapping is done in the class as properties! – rrauenza May 17 '19 at 21:45

score 4 · Answer 2 · answered May 17 '19 at 21:40

But wouldn't a namedtuple also need a hashtable of the attributes?

Nope. A namedtuple's instance layout is exactly the same as a regular tuple. The mapping from tuple entries to attributes is provided by generated descriptors, which are a mechanism Python provides for controlling attribute resolution. The descriptors are stored in the generated namedtuple type, so they're a per-type cost, not per-instance. Currently, the descriptors are property objects, as you can see in the current implementation, but that's subject to change (especially if any of this gets rewritten in C).

A namedtuple gets to be way more memory-efficient than a dict because as far as memory layout goes, it's just a tuple.

score 0 · Answer 3 · answered May 17 '19 at 21:03

It would stand to reason that a named tuple only needs its mapping once for all instances (name --> index). The hashing table is probably located in some centrally located meta-data (namespace), not in the object itself so it is not counted in the memory allocation of each instance.

Why do namedtuples use less memory than dictionaries?

3 Answers3

Linked