Modifying size of iterable during for loop - how is looping determined?

Question

For loops are mentioned in two places in the python docs (that I have found). I did try to find the source code for for loops in cpython but to no avail.

Here's what I'm trying to understand: I had assumed for loops were a sort of while i <= len(iterable) then loop or if i <= len(iterable) then loop:. I'm not sure that's the case, and here's why:

y = [1, 2, 3, 4]
for x in y:
  print(y)
  print(y.pop(0))

Output:
[1, 2, 3, 4]
1
[2, 3, 4]
2

I know you shouldn't modify an iterable while you're looping through it. I know that. But still, this isn't a random result - it happens every time this code is run: 2 loops. You also get 2 loops if you run pop() instead.

Maybe even curiouser, it seems like you reliably get len(y)+1//2 loops (at least using .pop(), I haven't tried much other testing):

if y = [1, 2] there is one loop
if y = [1, 2, 3] there are two loops
if y = [1, 2, 3, 4] there are still two loops
if y = [1, 2, 3, 4, 5] there are three loops
if y = [1, 2, 3, 4, 5, 6] there are still three loops
if y = [1, 2, 3, 4, 5, 6, 7] there are four loops

According to the Python docs:

Note

There is a subtlety when the sequence is being modified by the loop (this can only occur for mutable sequences, e.g. lists). An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if the suite deletes the current (or a previous) item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if the suite inserts an item in the sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence, e.g.,

for x in a[:]:
    if x < 0: a.remove(x)

Can anyone explain the logic Python uses when it is looping through an iterable that is modified during the loop? How do iter and StopIteration, and __getitem__(i) and IndexError factor in? What about iterators that aren't lists? And most importantly, is this / where is this in the docs?

As @Yang K suggested:

y = [1, 2, 3, 4, 5, 6, 7]
for x in y:
  print("y: {}, y.pop(0): {}".format(y, y.pop(0)))
  print("x: {}".format(x))

# Output
y: [2, 3, 4, 5, 6, 7], y.pop(0): 1
x: 1
y: [3, 4, 5, 6, 7], y.pop(0): 2
x: 3
y: [4, 5, 6, 7], y.pop(0): 3
x: 5
y: [5, 6, 7], y.pop(0): 4
x: 7

Just print `x` in the `for` loop and you'll understand the logic behind it. First iteration: It does everything normally. Second Iteration : `y[0]` has been removed in the first iteration. normally it would go to `y[1]`, but since you just removed something it skips it and goes to `y[2]` instead. Third iteration: You have removed what was originally `y[2]` and the link shrinks. Normally, it would go to `y[3]`, but since you shrinked it, it goes to `y[4]`. But since `y[4]` is the end of the list and doesn't exist, the loop stops running. — Yang K, Nov 16 '18 at 04:16
Python uses iterator-based for-loops. A for loop will result in a call to `iter` on the value being looped over, and the resulting iterator will have `next` called on it until it raises `StopIteration`. Now, there is some baggage left over from before this was formalized, there is an alternative iterator protocol that calls `__getitem__(i)` (i.e. the indexing operator `[i]`) with `i` starting from 0 until the indexing operation throws an `IndexError` — juanpa.arrivillaga, Nov 16 '18 at 04:16
Note, that `len` is never involved, python for-loops are not equivalent to C-like for-loops. See [this](https://stackoverflow.com/a/47348171/5014455) answer I made a while back to another question where I go into some details about this — juanpa.arrivillaga, Nov 16 '18 at 04:17
@juanpa.arrivillaga I found your first comment hella opaque, but the second one (which I understood) is really interesting, thanks! The answer you linked says the loop terminates when the iterator is "exhausted," so that has to do with the nature of how lists are managed in memory in Python (I think someone called this a "linked" list before)? According to this [this](https://www.quora.com/How-are-Python-lists-implemented-internally) Python doesn't use linked lists but instead stores the length of a list as metadata. — NotAnAmbiTurner, Nov 16 '18 at 04:24
Different iterators are implemented differently. List iterators go strictly by index, and are exhausted when the index is no longer a valid index. — Amadan, Nov 16 '18 at 04:29
@NotAnAmbiTurner no, it has nothing to do with memory management or how lists are implemented, it is how *list iterators* are implemented. For what it is worth, Python lists are implemented as resizable arrays, not linked lists. And that first post explained it all, but you have to understand the [iterator protocol](https://docs.python.org/3/library/stdtypes.html#typeiter) Here's a [good question on the subject too](https://stackoverflow.com/questions/19151/build-a-basic-python-iterator). Note, in the Python 2 to Python 3 transition, `next` became `__next__` — juanpa.arrivillaga, Nov 16 '18 at 04:56
Note, **lists are not iterators**. Lists are *iterable* (because their `iter` method *returns an iterator*). — juanpa.arrivillaga, Nov 16 '18 at 04:59

Amadan · Accepted Answer · 2018-11-16T05:13:56.567

The loop executes till iterable says it has no more elements. After two cycles, the iterable has gone through two elements, and has lost two elements, which means it is at its end, and the loop terminates.

Your code is equivalent to this:

y = [1, 2, 3, 4]
i = iter(y)
while True:
    try:
        x=next(i)
    except StopIteration:
        break
    print(y)
    print(y.pop(0))

The list iterator holds the index that is up to be read next. In the third cycle, the list is [3, 4], and next(i) would be needing to read y[2], which is not possible, so next raises StopIteration, which ends the loop.

EDIT As to your other questions:

How do iter and StopIteration, and __getitem__(i) and IndexError factor in?

The first two are as described above: it is what defines a for loop. Or, if you will, it is the contract of iter: it will yield stuff till it stops with StopIteration.

The latter two, I don't think participate at all, since the list iterator is implemented in C; for example, the check for whether the iterator is exhausted directly compares the current index with PyList_GET_SIZE, which directly looks at ->ob_size field; it doesn't pass through Python any more. Obviously, you could make a list iterator that would be fully in pure Python, and you'd likely be either using len to perform the check, or catching IndexError and again letting the underlying C code perform the check against ->ob_size.

What about iterators that aren't lists?

You can define any object to be iterable. When you call iter(obj), it is the same as calling obj.__iter__(). This is expected to return an iterator, which knows what to do with i.__next__() (which is what next(i) translates to). I believe dicts iterate (I think, haven't checked) by having an index into the list of its keys. You can make an iterator that will do anything you want, if you code it. For example:

class AlwaysEmpty:
    def __iter__(self):
        return self
    def __next__(self):
        raise StopIteration

for x in AlwaysEmpty():
    print("there was something")

will, predictably, print nothing.

And most importantly, is this / where is this in the docs?

Iterator Types

Great, thank you! For reference: [iter()](https://docs.python.org/3/library/functions.html#iter) and [next()](https://docs.python.org/3/library/functions.html#next) — NotAnAmbiTurner, Nov 16 '18 at 04:34
Well, I was mistaken, list's use the actual iterator protocol, not the "old style", legacy sequence iteration protocol (which I can't find a good reference for in the docs, although it is alluded to [here](https://docs.python.org/3/reference/expressions.html#membership-test-details)). Here is a [question that addresses it](https://stackoverflow.com/questions/20551042/whats-the-difference-between-iter-and-getitem). I think by now, all the built-in types use the "real" protocol, or at least wrap their other implementation, but I remember `str` and `list` used to use it, at one time — juanpa.arrivillaga, Nov 16 '18 at 05:06

Modifying size of iterable during for loop - how is looping determined?

1 Answers1