4

As we (or at least I) learned in this answer simple tuples that only contain immutable values are not tracked by python's garbage collector, once it figures out that they can never be involved in reference cycles:

>>> import gc
>>> x = (1, 2)
>>> gc.is_tracked(x)
True
>>> gc.collect()
0
>>> gc.is_tracked(x)
False

Why isn't this the case for namedtuples, which are a subclass of tuple from the collections module that features named fields?

>>> import gc
>>> from collections import namedtuple
>>> foo = namedtuple('foo', ['x', 'y'])
>>> x = foo(1, 2)
>>> gc.is_tracked(x)
True
>>> gc.collect()
0
>>> gc.is_tracked(x)
True

Is there something inherent in their implementation that prevents this or was it just overlooked?

Community
  • 1
  • 1
moben
  • 69
  • 8
  • 2
    I believe the reason is simply because `nametuple` is not a built-in class. I.e. the GC is doing an explicit check for `tuple` during its cycle, it does **not** check for immutable types with only immutable elements(which is hard to check in the general case). Note that subclassing `tuple` does *not* imply immutability. – Bakuriu Nov 04 '13 at 15:42
  • 3
    The referenced post says the GC "special cases" certain objects. I imagine it's a direct check against the tuple type. As the post says, doing this special casing is a tradeoff. In each case, there's a cost associated with even checking for the potential exemption. But the GC designers attempt to make decisions that trade a small up front payment for a long term win. Doing an introspective "is it a subclass of tuple" would be a bit more expensive than just "is it a tuple". Since tuples have a high frequency, it pays off, but NamedTuples aren't likely to be as frequent. – Travis Griggs Nov 04 '13 at 15:44
  • 3
    @TravisGriggs The performance issue is not in the checking of "is `tuple` subclass". The problem is that a `tuple` subclass *can* be mutable, hence the whole "is a tuple subclass" is completely meaningles for the GC. You should do an "is_immutable_object" generic check, which is *really* expensive (if at all possible) and then check that all contained objects are immutable. It's *much* easier to simply special case some built-in types that give some clear and simple guarantees on their immutability. – Bakuriu Nov 04 '13 at 15:52

2 Answers2

7

The only comment about this that I could find is in the gcmodule.c file of the Python sources:

NOTE: about untracking of mutable objects. Certain types of container cannot participate in a reference cycle, and so do not need to be tracked by the garbage collector. Untracking these objects reduces the cost of garbage collections. However, determining which objects may be untracked is not free, and the costs must be weighed against the benefits for garbage collection.

There are two possible strategies for when to untrack a container:

  1. When the container is created.
  2. When the container is examined by the garbage collector.

Tuples containing only immutable objects (integers, strings etc, and recursively, tuples of immutable objects) do not need to be tracked. The interpreter creates a large number of tuples, many of which will not survive until garbage collection. It is therefore not worthwhile to untrack eligible tuples at creation time.

Instead, all tuples except the empty tuple are tracked when created. During garbage collection it is determined whether any surviving tuples can be untracked. A tuple can be untracked if all of its contents are already not tracked. Tuples are examined for untracking in all garbage collection cycles. It may take more than one cycle to untrack a tuple.

Dictionaries containing only immutable objects also do not need to be tracked. Dictionaries are untracked when created. If a tracked item is inserted into a dictionary (either as a key or value), the dictionary becomes tracked. During a full garbage collection (all generations), the collector will untrack any dictionaries whose contents are not tracked.

The module provides the python function is_tracked(obj), which returns the current tracking status of the object. Subsequent garbage collections may change the tracking status of the object. Untracking of certain containers was introduced in issue #4688, and the algorithm was refined in response to issue #14775.

(See the linked issues to see the real code that was introduced to allow untracking)

This comment is a bit ambiguous, however it does not state that the algorithm to choose which object to "untrack" applies to generic containers. This means that the code check only tuples ( and dicts), not their subclasses.

You can see this in the code of the file:

/* Try to untrack all currently tracked dictionaries */
static void
untrack_dicts(PyGC_Head *head)
{
    PyGC_Head *next, *gc = head->gc.gc_next;
    while (gc != head) {
        PyObject *op = FROM_GC(gc);
        next = gc->gc.gc_next;
        if (PyDict_CheckExact(op))
            _PyDict_MaybeUntrack(op);
        gc = next;
    }
}

Note the call to PyDict_CheckExact, and:

static void
move_unreachable(PyGC_Head *young, PyGC_Head *unreachable)
{
    PyGC_Head *gc = young->gc.gc_next;

  /* omissis */
            if (PyTuple_CheckExact(op)) {
                _PyTuple_MaybeUntrack(op);
            }

Note tha call to PyTuple_CheckExact.

Also note that a subclass of tuple need not be immutable. This means that if you wanted to extend this mechanism outside tuple and dict you'd need a generic is_immutable function. This would be really expensive, if at all possible due to Python's dynamism (e.g. methods of the class may change at runtime, while this is not possible for tuple because it is a built-in type). Hence the devs chose to stick to few special case only some well-known built-ins.


This said, I believe they could special case namedtuples too since they are pretty simple classes. There would be some issues for example when you call namedtuple you are creating a new class, hence the GC should check for a subclass. And this might be a problem with code like:

class MyTuple(namedtuple('A', 'a b')):
    # whatever code you want
    pass

Because the MyTuple class need not be immutable, so the GC should check that the class is a direct subclass of namedtuple to be safe. However I'm pretty sure there are workarounds for this situation.

They probably didn't because namedtuples are part of the standard library, not the python core. Maybe the devs didn't want to make the core dependent on a module of the standard library.

So, to answer your question:

  • No, there is nothing in their implementation that inherently prevents untracking for namedtuples
  • No, I believe they did not "simply overlook" this. However only python devs could give a clear answer to why they chose not to include them. My guess is that they didn't think it would provide a big enough benefit for the change and they didn't want to make the core dependent on the standard library.
Bakuriu
  • 98,325
  • 22
  • 197
  • 231
3

@Bakunu gave an excellent answer - accept it :-)

A gloss here: No untracking gimmick is "free": there are real costs, in both runtime and explosion of tricky code to maintain. The base tuple and dict types are very heavily used, both by user programs and by the CPython implementation, and it's very often possible to untrack them. So special-casing them is worth some pain, and benefits "almost all" programs. While it's certainly possible to find examples of programs that would benefit from untracking namedtuples (or ...) too, it wouldn't benefit the CPython implementation or most user programs. But it would impose costs on all programs (more conditionals in the gc code to ask "is this a namedtuple?", etc).

Note that all container objects benefit from CPython's "generational" cyclic gc gimmicks: the more collections a given container survives, the less often that container is scanned (because the container is moved to an "older generation", which is scanned less often). So there's little potential gain unless a container type occurs in great numbers (often true of tuples, rarely true of dicts) or a container contains a great many objects (often true of dicts, rarely true of tuples).

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • I did accept @Bakunu's answer, but thank you for the additional insights into the gc :) – moben Nov 04 '13 at 19:56