13

Deleting an item from the while iterating through it will usually result in RuntimeError: dictionary changed size during iteration exception:

d = {1: 2}
# exception raised
for k in d:
  del d[k]

To be more precise, the deletion itself will succeed. However, to enter the next round of iteration, the interpreter has to call next(it), where it is an iterator through the dictionary it obtained earlier. At that point, next() will notice that the dictionary size changed, and complain.

So far so good. But what if we both delete and add an item to a dictionary:

d = {1: 1}
# no exception raised
for k in d:
  # order of next two lines doesn't matter
  d[k*10] = k*10
  del d[k]

I am almost sure that this is not safe (the docs imply neither insert nor delete is allowed during iteration). Why is the interpreter allowing this code to run without error?

My only guess is that it's too expensive to check which iterators are invalidated whenever an insert or delete method is called. So dict doesn't attempt to be perfect about raising this exception. Instead, it just keeps track of the size of the dictionary inside each iterator, and checks that it hasn't changed whenever the iterator is actually asked to move to the next item. Is there no approach that would enable full validation at a low cost?

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
max
  • 49,282
  • 56
  • 208
  • 355
  • Are you looking for something to make your loop more robust or do you want to discuss the Python implementation details? – Klaus D. Dec 04 '16 at 05:51
  • Looks like you want dictionary keys immutable while in the loop. I do not think it's doable. – DYZ Dec 04 '16 at 05:54
  • @KlausD. Hmm I guess both? If there's a technique that can do that, I'd consider using it myself. But in order to understand its costs (run-time, code complexity, etc.), it would be important for me to know why CPython doesn't use it. – max Dec 04 '16 at 07:13
  • @DYZ Immutable keys is a lot stronger than what I was asking. I don't want to raise exception just because a value for a given key was changed - that is completely safe to do in a loop (indeed if it wasn't python would be a broken language!) – max Dec 04 '16 at 07:29
  • 1
    @DYZ Btw, making an immutable version of a `dict` is available as a built-in feature in python: `immutable_d = types.MappingProxyType(d)`. Using this in the loop would be safe if one doesn't need any modifications at all. Of course, one can still mess up by using the original (mutable) `d`. Anyway, that's a lot stronger restriction that I needed. – max Dec 04 '16 at 07:30

3 Answers3

5

One approach to ensure that an exception is raised whenever an attempt is made to insert or delete a key while in a loop, is to maintain the number of modifications made to the dictionary. Then the iterators can check that that number didn't change in their __next__ method (instead of verifying that dictionary size didn't change).

This code would do that. Using SafeDict or its keys() / items() / values() proxy, the loops become safe from an accidental insertion/deletion:

class SafeKeyIter:
    def __init__(self, iterator, container):
        self.iterator = iterator
        self.container = container
        try:
            self.n_modifications = container.n_modifications
        except AttributeError:
            raise RuntimeError('container does not support safe iteration')

    def __next__(self):
        if self.n_modifications != self.container.n_modifications:
            raise RuntimeError('container modified duration iteration')
        return next(self.iterator)

    def __iter__(self):
        return self


class SafeView:
    def __init__(self, view, container):
        self.view = view
        self.container = container

    def __iter__(self):
        return SafeKeyIter(self.view.__iter__(), self.container)

class SafeDict(dict):
    def __init__(self, *args, **kwargs):
        self.n_modifications = 0
        super().__init__(*args, **kwargs)

    def __setitem__(self, key, value):
        if key not in self:
            self.n_modifications += 1
        super().__setitem__(key, value)

    def __delitem__(self, key):
        self.n_modifications += 1
        super().__delitem__(key)

    def __iter__(self):
        return SafeKeyIter(super().__iter__(), self)

    def keys(self):
        return SafeView(super().keys(), self)

    def values(self):
        return SafeView(super().values(), self)

    def items(self):
        return SafeView(super().items(), self)

# this now raises RuntimeError:
d = SafeDict({1: 2})
for k in d:
    d[k * 100] = 100
    del d[k]

This doesn't seem too expensive, so I'm not sure why it's not implemented in CPython dict. Perhaps the extra cost of updating n_modifications on a dictionary was judged too high.

max
  • 49,282
  • 56
  • 208
  • 355
  • This is interesting, so i ran a few benchmarks on it. Creating a `SafeDict` only seemed to add about 5% overhead vs. a regular dict (and if implemented in C, probably less). Iterating, and updating each value in a 10000 item `SafeDict` was a full order of magnitude slower than a 10000 item dict though. [I put that benchmark here](https://trinket.io/python3/a891539584) – Gerrat Dec 04 '16 at 17:47
  • @Gerrat hmm you're comparing my pure python implementation to a C implementation. The moment there's even a single line of pure python inside ·__next__`, you would see an order of magnitude hit. For a meaningful benchmarking, this needs to be rewritten in C. – max Dec 04 '16 at 18:44
  • A C implementation would certainly be faster. Without having the C implementation it's a bit of a guess to say how much faster. I do find your implementation interesting - it might be worthwhile posting your proof -of-concept on the [Dev's mailing list](https://mail.python.org/mailman/listinfo/python-dev) – Gerrat Dec 04 '16 at 18:59
2

The simplest answer is because you delete 1 item and add 1 item so the fact that the size has changed actually never gets caught; the RuntimeError is raised when there is a difference between the size of the iterator and the dictionary for that iterator:

if (di->di_used != d->ma_used) {
    PyErr_SetString(PyExc_RuntimeError,
                    "dictionary changed size during iteration");
    di->di_used = -1; /* Make this state sticky */
    return NULL;
} 

when you add one and remove one, di->di_used stays the same to d->ma_used (which gets incremented by one and decremented by one). The operations (del and key add) are performed on the dict object d and, due to the balance of these operations, no mismatch is found in the previous if clause I added.

But, if you add two keys, for example, you'll get the same error:

d = {1: 1}
for k in d:
  del d[k]
  d[1] = 1
  d[2] = 2

RuntimeErrorTraceback (most recent call last)
<ipython-input-113-462571d7e0df> in <module>()
      1 d = {1: 1}
      2 # no exception raised
----> 3 for k in d:
      4   # order of next two lines doesn't matter
      5   del d[k]

RuntimeError: dictionary changed size during iteration

because realizing that size has changed is caught here. If, of course, you decrement twice, the same behavior as before occurs, it balances out.

As I iterated in the comment section, a check evaluating if insertions or deletions has occurred in a balanced way is not as trivial as checking if the size has simply changed. It also wouldn't make sense to me on two other accounts:

  • If people do indeed choose to alter a dictionary during iteration, odds are they won't be doing it in a balanced way so the check in place should suffice for the most common cases.
  • If you do decide to add more checks you'd be affecting the performance of pretty much every thing in Python (due to dicts being ubiquitous).

Overall I doubt adding this check would benefit; it is pretty well established to most that iterating over a collection while changing it is not the best idea.

Like grown-ups we should realize that Python shouldn't check everything for us and, instead, just don't do things when they have known unwanted effects.

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
  • Well, technically speaking yes. But I meant why is `dict` designed so that it only complains when the number of insertions doesn't equal the number of deletions. When they are equal (and non-zero) the code is just as unsafe. – max Dec 04 '16 at 15:45
  • @max Because that is a requirement that can't be trivially solved as the most common case of unbalanced insertions/deletions can. In the end, Python isn't *really* strict about what you can and cannot do, if you want to do something silly, go ahead but, face the consequence. – Dimitris Fasarakis Hilliard Dec 04 '16 at 17:46
  • my proposed solution in the answer below would be too slow, I guess? – max Dec 04 '16 at 18:47
  • @max yup, I'm not quite sure if protecting someone from adding/removing elements from a dictionary is really worth the additional overhead (which would exist even if re-implemented in `C` to a smaller degree); especially on a built-in collection that is practically the *backbone* of Python (and for a scenario that is quite unlikely). – Dimitris Fasarakis Hilliard Dec 04 '16 at 19:05
0

Is there no approach that would enable full validation at a low cost?

Here is a relevant comment from Alex Martelli on the topic.

because a container doesn't even keep track of iterators that are out on it, much less hook even altering-method to loop over every such iterator and somehow magically let each iterator know about the alterations. It would be a lot subtle, complex code, and checks slowing down very frequent operations

So, at least according to a core Python dev, we can't have full validation at a low cost.

Community
  • 1
  • 1
Gerrat
  • 28,863
  • 9
  • 73
  • 101
  • 2
    Hmm I think Alex Martelli was referring to the difficulty of *allowing* modifications to the dictionary while iterating. This is far more difficult than *detecting* modifications. – max Dec 04 '16 at 14:18