0

Code 1:

kv = {'a': 'A', 'b': 'B', 'c': 'C'}
d = {'a': 1, 'b': 2, 'c': 3}
for key in d.keys():
    d[kv[key]] = d.pop(key)

print(d)

Output:

{'A': 1, 'B': 2, 'C': 3}

The above example works fine
But, when we increase the number of elements in the dictionary,
it raises KeyError as shown in example below Code 2:

kv = {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}
d = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
for key in d.keys():
    d[kv[key]] = d.pop(key)
print(d)

Output:

Traceback (most recent call last):
  File "C:\Users\user\OneDrive\Documents\VSCode_Python\new.py", line 11, in <module>
    d[kv[key]] = d.pop(key)
KeyError: 'A'
smitkpatel
  • 722
  • 6
  • 21
  • 1
    You're modifying the list you are iterating through. – Jacques Gaudin Feb 14 '22 at 17:01
  • Does [this](https://stackoverflow.com/questions/6777485/modifying-a-python-dict-while-iterating-over-it) answer your question? – kcsquared Feb 14 '22 at 17:04
  • @JacquesGaudin Yes, I understand that, why do I not get such an error when the dict size is 3 but error is raised when the size is 4 – smitkpatel Feb 14 '22 at 17:11
  • @kcsquared, not really, NO, what I am looking for, is an explanation on why does it **NOT happen** for a dict of size 3 – smitkpatel Feb 14 '22 at 17:13
  • 1
    The iteration order is somewhat random, controlled by internal implementation details, that’s why. – deceze Feb 14 '22 at 17:30
  • @deceze this above code has been reproduced and tested on 4 different python versions across RHEL, Windows, so random does not really make sense. – smitkpatel Feb 14 '22 at 17:38
  • 3
    'Arbitrary' is a better descriptor than 'random'. The behavior of iteration failing after precisely 6 additions to the dictionary have occurred has been [answered and explained before](https://stackoverflow.com/a/44763926/16757174). It's due to the default minimum size of dictionaries being 8 and their 'usable fraction' being 2/3, which are both implementation details. – kcsquared Feb 14 '22 at 17:48

2 Answers2

1

The simplest answer can be found here in the Python docs on "Dictionary view objects": https://docs.python.org/3/library/stdtypes.html#dict-views:

Iterating views while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries.

It says it may fail to iterate over all entries. In your example with a dict of length 4, it appears that iterating the keys() dictionary view object works (i.e., behaves the same way it would if you were not mutating the dictionary inside the loop) for the first 3 entries (just before you pop each entry in turn) and then gets to the newly set key 'A' before getting to key 'd'. In other words, iterating this view has indeed failed to iterate over all entries for your dictionary of length 4.

The question as to why it did not fail for a dictionary of length 3 could perhaps be answered with reference to implementation details that allow us to understand how the language internals give rise to this luck-of-the-draw behavior, but ultimately, the answer is that the Python language specification does not require your code to fail for a view on a dictionary of any length, and it just so happens that it does not fail for a dict of length 3.

Please see comments by @kcsquared and answers by others for insight into implementation-specific details as to the arbitrary point of failure of the/a current Python language implementation.

constantstranger
  • 9,176
  • 2
  • 5
  • 19
1

The actual reason why this happens is due to the implementation of dictionaries in Python. This article describes it very well.

In essence, Python calculates the hash of the key and applies a mask to find a slot to store the value.

For a small dict you have 8 slots, so the function to determine in which slot you put the value is:

def find_slot(key):
    return hash(key) & 7

You'll notice that there is no collision in the original dict:

list(map(find_slot, ('a', 'b', 'c', 'd')))
>>>[5, 0, 6, 1]

But when the 'A' key is added there is a collision:

list(map(find_slot, ('a', 'b', 'c', 'd', 'A')))
>>>[5, 0, 6, 1, 5]

The same slot is given for 'a' and 'A', so the dictionary has to expand to have more slots.

Until the dictionary is expanded, the keys hash are stored in cache. When it expands, the cache has to be renewed. You can see that happening with some print statements:

from sys import getsizeof

kv = {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}
d = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
for key in d.keys():
    print(d)
    print(getsizeof(d))
    print(d.keys(), key)
    d[kv[key]] = d.pop(key)
print(d)

This is the principle at play. Not sure the detail is 100% correct but happy to fix it if anyone has any comments.

Jacques Gaudin
  • 15,779
  • 10
  • 54
  • 75