2

I am subclassing a dict and would love some help understanding the behavior below (please) [Python version: 3.11.3]:

class Xdict(dict):
    def __init__(self, d):
        super().__init__(d)
        self._x = {k: f"x{v}" for k, v in d.items()}

    def __getitem__(self, key):
        print("in __getitem__")
        return self._x[key]

    def __str__(self):
        return str(self._x)

    def __iter__(self):
        print("in __iter__")

d = Xdict({"a": 1, "b": 2})
print(d)
print(dict(d))

Produces this output:

{'a': 'x1', 'b': 'x2'}
in __getitem__
in __getitem__
{'a': 'x1', 'b': 'x2'}

If I comment out the __iter__ method the output changes like so:

{'a': 'x1', 'b': 'x2'}
{'a': 1, 'b': 2}

Obviously the __iter__ method is not getting called, however its presence is affecting the behaviour.

I am just interested in why this happens. I am not looking for alternative solutions to prevent it.

Thanks, Paul.

pauleohare
  • 209
  • 2
  • 8
  • [Relevant thread](https://stackoverflow.com/questions/61657240/iterating-over-dictionary-using-getitem-in-python), [relevant thread](https://stackoverflow.com/questions/38736872/equivalent-code-of-getitem-in-iter) and finally another [relevant thread](https://stackoverflow.com/questions/20551042/whats-the-difference-between-iter-and-getitem), while bit dated due to Python 2.7, still relevant. – metatoaster Apr 26 '23 at 11:24
  • 1
    Great question. Seems like the presence of `__iter__` is what is used to test if the subclass is iterable, and that if it doesn't exist the parent class methods will be used, but when it is present the new dictionary isn't built using `__iter__` but rather by iteration over `d.keys()` using the overridden `__getitem__` to obtain the values. In fact I'm confident d.keys is used, because if you implement a keys method in Xdict doing something sufficiently stupid then dict(d) will give a KeyError. But I can't find where this is specified. – Matthew Towers Apr 26 '23 at 13:15
  • @metatoaster The first link is interesting but doesn't address the behavior, the second and third are not relevant. – pauleohare Apr 26 '23 at 14:33
  • @Matthew Towers That is interesting about keys. I didn't spot that. – pauleohare Apr 26 '23 at 14:36
  • They are _definitely_ relevant, at the very least the the threads cover the relevant dunder methods, also this provides the reverse linkage for those other threads (as the relevancy linkage is bi-directional - the linked thread on the right side bar). – metatoaster Apr 27 '23 at 00:40

1 Answers1

6

Python's internals often directly invoke the C-level implementations of built-in class functionality, even in cases where a subclass may have overridden that functionality, leading to a lot of weird bugs where method overrides aren't invoked where you'd expect them to be.

This is the case for a lot of the dict implementation, but when dicts became order-preserving in Python 3.6, one of those bugs hit the standard library: when x is an OrderedDict, dict(x) would copy the underlying dict implementation's order, instead of the OrderedDict order (which is tracked separately).

To fix this bug, they added a check to dict_merge, in the code where it decides whether to use the fast path:

if (PyDict_Check(b) && (Py_TYPE(b)->tp_iter == (getiterfunc)dict_iter)) {

dict_merge is the underlying routine responsible for copying the contents of another mapping into a dict. Previously, this line just said if (PyDict_Check(b)) {, which would use the fast path if the other mapping was any dict instance. Now, it also checks that the instance doesn't have an overridden __iter__.

If the instance has an overridden __iter__, dict_merge will use the slow path, hence the difference you saw. However, the slow path doesn't actually use __iter__. It uses keys, which is why your code worked even though your __iter__ doesn't return an iterator.

user2357112
  • 260,549
  • 28
  • 431
  • 505