19

I'm trying to learn Python, and I started to play with some code:

a = [3,4,5,6,7]
for b in a:
    print(a)
    a.pop(0)

And the output is:

[3, 4, 5, 6, 7]
[4, 5, 6, 7]
[5, 6, 7]

I know that's not a good practice change data structures while I'm looping on it, but I want to understand how Python manage the iterators in this case.

The principal question is: How does it know that it has to finish the loop if I'm changing the state of a?

vezeli
  • 336
  • 2
  • 8
Pau Trepat
  • 697
  • 1
  • 6
  • 24
  • 5
    you should print `b` as well, that would give you more clues – Jean-François Fabre Apr 04 '17 at 12:05
  • 1
    Python loops over the list on demand. Means that in each iteration it assigns the respective item to the throwaway variable and continue the iteration until it encounters the end of the list or raise an `IndexError`. Answering to how python knows it has to finish the loop if you're changing list, it's all based on encountering the last item or raising an `IndexError`. By changing the list you can just shorten or extend the loop. – Mazdak Apr 04 '17 at 12:07
  • 1
    This may help: http://stackoverflow.com/questions/13939341/why-does-a-for-loop-with-pop-method-or-del-statement-not-iterate-over-all-list – GLR Apr 04 '17 at 12:07
  • http://stackoverflow.com/questions/9884132/what-exactly-are-pythons-iterator-iterable-and-iteration-protocols – bruno desthuilliers Apr 04 '17 at 12:16
  • Python does not promise this behavior. A future version might give you a dict-like `RuntimeError: list changed size during iteration`. – user2357112 Apr 04 '17 at 16:23
  • Do you have a particular reason for learning Python 2? If not, you should be learning Python 3. – John Y Apr 04 '17 at 21:21
  • So list contents are re evaluated at every iteration? – variable Nov 08 '19 at 18:18

4 Answers4

14

The reason why you shouldn't do that is precisely so you don't have to rely on how the iteration is implemented.

But back to the question. Lists in Python are array lists. They represent a continuous chunk of allocated memory, as opposed to linked lists in which each element in allocated independently. Thus, Python's lists, like arrays in C, are optimized for random access. In other words, the most efficient way to get from element n to element n+1 is by accessing to the element n+1 directly (by calling mylist.__getitem__(n+1) or mylist[n+1]).

So, the implementation of __next__ (the method called on each iteration) for lists is just like you would expect: the index of the current element is first set at 0 and then increased after each iteration.

In your code, if you also print b, you will see that happening:

a = [3,4,5,6,7]
for b in a:
    print a, b
    a.pop(0)

Result :

[3, 4, 5, 6, 7] 3
[4, 5, 6, 7] 5
[5, 6, 7] 7

Because :

  • At iteration 0, a[0] == 3.
  • At iteration 1, a[1] == 5.
  • At iteration 2, a[2] == 7.
  • At iteration 3, the loop is over (len(a) < 3)
kjaquier
  • 824
  • 4
  • 11
9

kjaquier and Felix have talked about the iterator protocol, and we can see it in action in your case:

>>> L = [1, 2, 3]
>>> iterator = iter(L)
>>> iterator
<list_iterator object at 0x101231f28>
>>> next(iterator)
1
>>> L.pop()
3
>>> L
[1, 2]
>>> next(iterator)
2
>>> next(iterator)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
StopIteration

From this we can infer that list_iterator.__next__ has code that behaves something like:

if self.i < len(self.list):
    return self.list[i]
raise StopIteration

It does not naively get the item. That would raise an IndexError which would bubble to the top:

class FakeList(object):
    def __iter__(self):
        return self

    def __next__(self):
        raise IndexError

for i in FakeList():  # Raises `IndexError` immediately with a traceback and all
    print(i)

Indeed, looking at listiter_next in the CPython source (thanks Brian Rodriguez):

if (it->it_index < PyList_GET_SIZE(seq)) {
    item = PyList_GET_ITEM(seq, it->it_index);
    ++it->it_index;
    Py_INCREF(item);
    return item;
}

Py_DECREF(seq);
it->it_seq = NULL;
return NULL;

Although I don't know how return NULL; eventually translates into a StopIteration.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Alex Hall
  • 34,833
  • 5
  • 57
  • 89
  • 1
    "Although I don't know how return NULL; eventually translates into a StopIteration" It doesn't. `StopIteration` becomes `NULL` in `PyIter_Next` (and possibly in other places that may implement the iterator protocol themselves. Haven't checked further) – 3Doubloons Apr 04 '17 at 15:59
  • @3Doubloons I'm not sure what you mean, `NULL` *does* translate to `StopIteration`. It is explicitly tested for in the callee of `listiter_next` and the exception is set if no other error has occurred. – Dimitris Fasarakis Hilliard Apr 04 '17 at 16:45
  • @JimFasarakisHilliard: What I'm reading in PyIter_Next (and builtins.all and list.extend for a second opinion), is that when tp_iternext returns NULL, the C code checks if there was an exception. If there was and it was a StopIteration, it clears it (because that's normal). Based on that (and only that), it seems to me that a C implementation of `__next__` simply returns NULL without setting an exception to stop. It's only Python code that needs to raise StopIteration, because it can't return NULL (None is not NULL, it is Py_None) – 3Doubloons Apr 04 '17 at 17:18
  • @3Doubloons The slot for `tp_iternext` is filled by a descriptor that sets `PyExc_StopIteration` so, even though the actual function that's set to `tp_iternext` isn't setting the exception, it's wrapped by something that does. That's with what I disagreed when you said `NULL` doesn't translate to `StopIteration`. :-) – Dimitris Fasarakis Hilliard Apr 04 '17 at 17:25
  • 1
    Ah, yes. I see it now. Hidden deep inside typeobject.c, `wrap_next` raises StopIteration if the actual tp_iternext returned NULL without an exception. Mystery solved – 3Doubloons Apr 04 '17 at 18:45
2

We can easily see the sequence of events by using a little helper function foo:

def foo():
    for i in l:
        l.pop()

and dis.dis(foo) to see the Python byte-code generated. Snipping away the not-so-relevant opcodes, your loop does the following:

          2 LOAD_GLOBAL              0 (l)
          4 GET_ITER
    >>    6 FOR_ITER                12 (to 20)
          8 STORE_FAST               0 (i)

         10 LOAD_GLOBAL              0 (l)
         12 LOAD_ATTR                1 (pop)
         14 CALL_FUNCTION            0
         16 POP_TOP
         18 JUMP_ABSOLUTE            6

That is, it get's the iter for the given object (iter(l) a specialized iterator object for lists) and loops until FOR_ITER signals that it's time to stop. Adding the juicy parts, here's what FOR_ITER does:

PyObject *next = (*iter->ob_type->tp_iternext)(iter);

which essentially is:

list_iterator.__next__()

this (finally*) goes through to listiter_next which performs the index check as @Alex using the original sequence l during the check.

if (it->it_index < PyList_GET_SIZE(seq))

when this fails, NULL is returned which signals that the iteration has finished. In the meantime a StopIteration exception is set which is silently suppressed in the FOR_ITER op-code code:

if (!PyErr_ExceptionMatches(PyExc_StopIteration))
    goto error;
else if (tstate->c_tracefunc != NULL)
    call_exc_trace(tstate->c_tracefunc, tstate->c_traceobj, tstate, f);
PyErr_Clear();  /* My comment: Suppress it! */

so whether you change the list or not, the check in listiter_next will ultimately fail and do the same thing.

*For anyone wondering, listiter_next is a descriptor so there's a little function wrapping it. In this specific case, that function is wrap_next which makes sure to set PyExc_StopIteration as an exception when listiter_next returns NULL.

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
1

AFAIK, the for loop uses the iterator protocol. You can manually create and use the iterator as follows:

In [16]: a = [3,4,5,6,7]
    ...: it = iter(a)
    ...: while(True):
    ...:     b = next(it)
    ...:     print(b)
    ...:     print(a)
    ...:     a.pop(0)
    ...:
3
[3, 4, 5, 6, 7]
5
[4, 5, 6, 7]
7
[5, 6, 7]
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-16-116cdcc742c1> in <module>()
      2 it = iter(a)
      3 while(True):
----> 4     b = next(it)
      5     print(b)
      6     print(a)

The for loop stops if the iterator is exhausted (raises StopIteration).

Felix
  • 6,131
  • 4
  • 24
  • 44