3

I know that in the general case, it’s unsafe to mutate a dict while iterating over it:

d = {1: 2, 3: 4}
for k in d.keys():
    d[k + 1] = d[k] + 1

This yields a RuntimeError as expected. However, while reading the documentation for dict, I found the following (emphasis mine):

Iterating views while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries.

So I tried the following code:

d = {1: 2, 3: 4}
for k in d.keys():
    d[k] = d[k] + 1

Note: I did not add or remove any entries during the iteration; I only updated existing entries. I've tried it with a few examples and so far have not received a RuntimeError, and the loop works as expected.

Is this guaranteed to work by the language? Or have I just been lucky so far?

Note: I am using Python 3, so d.keys() returns a dynamic view rather than a list).

Andrew Sun
  • 4,101
  • 6
  • 36
  • 53
  • 2
    Alex Martelli says [here](https://stackoverflow.com/questions/2315520/in-python-how-do-i-loop-through-the-dictionary-and-change-the-value-if-it-equal/2315529#2315529) that *assigning a different value at a given existing index does not incur any problem*, so I believe it should be fine ;) – rafaelc Sep 14 '18 at 21:11
  • As far as I know that's perfectly fine (and indeed something you see quite often). The key point is that you're not actually altering the content of `d.keys()`, which you are when you add or delete a key - and this is almost certainly what the warning in the docs is about. – Robin Zigmond Sep 14 '18 at 21:12
  • I think you should be ok. Note that you can use `d[k] += 1` for the loop body (more concise and probably a little faster since it only needs to hash once per element). – Tom Karzes Sep 14 '18 at 21:16
  • 1
    As an aside, `d.keys()` is redundant to iterate over the keys, just use `for k in d`. – juanpa.arrivillaga Sep 14 '18 at 21:37
  • 1
    @TomKarzes This is not true in Python. Python does not know if the value is immutable or not, and so has to load the value, and then unconditionally store the result of the addition. – Dunes Sep 14 '18 at 21:37
  • @Dunes it should not have to invoke the hash function on `k` more than once. That should speed it up. – Tom Karzes Sep 14 '18 at 21:39
  • 1
    @Dunes - I wouldn't call the left-hand-side a "value", but OK. It doesn't matter whether it's immutable or not. With the `d[k] += 1`, Python only has to evaluate `d[k]` once. With `d[k] = d[k] + 1`, Python has to evaluate `d[k]` twice. – John Y Sep 14 '18 at 21:44
  • @TomKarzes That is an optimisation that Python does not make (for whatever reason). You can try it yourself by overriding the `__hash__` function of the key class and see how many times it gets called when doing `d[k] += 1`. Probably because there is the possibility of there being many many different opcodes being invoked between the load and the store opcodes. I believe Python does optimise hashing of strings by getting them to store their hash once computed. This is useful since the vast majority of keys will be strings (think of how attributes are stored). – Dunes Sep 14 '18 at 21:50
  • @Dunes Wow, it looks like you're right. That just seems sad to me. There is no reason for it to be implemented like that. I still prefer the `+=` syntax, but given that hash functions can be arbitrarily expensive, it seems pitiful that it doesn't save the result. I suspect it may be the result of the internal implementation not allowing the hash key to be passed, but instead reevaluating it on each access. Hopefully that will change in a future implementation. Note that if `k` is an expression, the *expression* is only evaluated once, even though it is hashed twice. – Tom Karzes Sep 14 '18 at 22:00
  • @TomKarzes Oh, I prefer `+=`, just didn't want you thinking there was a performance benefit. I suspect the real reason that python doesn't cache the result is that because it is dynamically typed. So when it is doing `a[b] += c` it doesn't know if `a` is a dict or a list or a tuple or whatever. And that any performance benefit from checking is in the specific case of a dict outweighed by the performance cost of checking in the general case of lists and tuples. The vast majority of any benefit is already gained by having strings cache their hash value. User classes can cache too if they want. – Dunes Sep 14 '18 at 22:35

1 Answers1

3

The internal structure of the dictionary is determined by the keys, and not the values. At least currently. This means that you can modify the value associated with a key, but adding or removing keys (which the first example's d[k + 1] = d[k] + 1 does) will cause problems.