TL;DR: It's because the dict is being resized when the size of the key table exceeds PyDict_MINSIZE
, which makes the interpreter realize that the iterator has moved past the point it should've stopped on.
Read on to understand how you could've found this out on your own.
It's not possible to answer such question completely, so I'll try to explain what I've found, and, at the same time, try to equip you with the tools needed to explore on your own.
While it does come down to implementation-specific undefined behavior, it isn't hard to find out what is going on, if you know how to navigate CPython code. Here is the list of suspects based on your code:
First, notice that the dictionary iterator only stops when goto fail
is used in dictiter_iternextkey()
. This can only happen if the iterator position goes past the number of entries in the dictionary keys table (di->di_pos >= di->di_dict->ma_keys->dk_nentries
, written as i >= n
in the code).
Let's use GDB to see what's actually going on. First, compile CPython 3.6.10 (see the devguide for complete instructions). Run CPython under GDB, set a breakpoint in dictiter_iternextkey()
, run your script, and print the di_pos
and dk_nentries
with every iteration:
git clone https://github.com/python/cpython
cd cpython
git checkout v3.6.10
./configure --with-pydebug
make -j 16 -s
# Put your code into weird.py
gdb ./python
(gdb) b Objects/dictobject.c:3480
(gdb) run weird.py
# Iterate these commands until process exits
(gdb) p di->di_pos
(gdb) p di->di_dict->ma_keys->dk_nentries
(gdb) c
What you'll see, is that on every iteration of your loop, di_pos
and dk_nentries
get incremented by one, except on the last one, where dk_nentries
is being reset to 1.
Now, we'll just need to find out what is resetting the dk_nentries
counter. There are two other lines in your code that could do that: del x[i]
and x[i+1] = None
. You could find out which one is it by reading the code, but let's use a watchpoint for that instead:
(gdb) b Objects/dictobject.c:3480
(gdb) run weird.py
(gdb) watch -l di->di_dict->ma_keys->dk_nentries
# 'c'-ontinue until the following output appears:
(gdb) c
Continuing.
Hardware watchpoint 3: -location di->di_dict->ma_keys->dk_nentries
Old value = 5
New value = -2604246222170760229
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:204
204 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
We're now in the memory management code. The new value looks as if the old key table was freed - it's now garbage. Let's look at the backtrace, to see what code issued the free()
-ing:
(gdb) bt
...
#5 0x00005555556206e6 in dictresize (mp=0x7ffff72ffaa8, minsize=<optimized out>) at Objects/dictobject.c:1314
#6 0x0000555555620751 in insertion_resize (mp=<optimized out>) at Objects/dictobject.c:1103
#7 0x0000555555620e6d in insertdict (mp=0x7ffff72ffaa8, key=5, hash=5, value=None)
#8 0x0000555555623e4a in PyDict_SetItem (op={}, key=5, value=None) at Objects/dictobject.c:1576
...
It happens when you add the key. The interpreter figures out how many entries are actually there when resizing the dictionary, and refreshes the table, including its counter. But why didn't it happen earlier?
If you look at the code that called insertion_resize()
, you'll see the following branch:
if (mp->ma_keys->dk_usable <= 0) {
/* Need to resize. */
if (insertion_resize(mp) < 0)
goto Fail;
find_empty_slot(mp, key, hash, &value_addr, &hashpos);
}
As you can see, the PyDictKeysObject
struct has a dk_usable
field. As an optimization, the key table is initialized with a bit more space, so that when adding 2-3 keys, the interpreter doesn't have to resize the dictionary right away.
The amount of "free space" that is there at the beginning is controlled by PyDict_MINSIZE
in PyDict_New()
. This is actually mentioned in the macro section of the file. Finding out why setting it to 8 lets a dict have at most 5 entries is left as an exercise.
Check for yourself: if you recompile CPython with PyDict_MINSIZE
set to 32 (it must be a power of 2), it will make your code iterate up to 20.