1

In python, what containers properly support mutation during iteration?

For example:

container = [1, 2, 3, 4]
for i in container:
    print(i)
    if i == 2:
        container.append(8)

Outputs 1 2 3 4 8 (lists can be appended during iteration).

However, if I replace .append(8) with .remove(1) the output becomes 1 2 4 (i.e., element 3 gets skipped). It seems as if list iteration is over indices not elements, and therefore only subsequent list items (not previous list items) can be safely removed during iteration.

Is there any container in the standard library that does permit elements to be added and removed during iteration, which the behaviour that:

  1. New elements do get iterated (as for list.append),
  2. Removed elements do not subsequently get iterated,
  3. Whether an element gets iterated over (or not) is never affected by additions/removals of other elements.

The application I have in mind is a registry of event callbacks. When triggered, I would like the callbacks to have the ability to eagerly register or deregister other callbacks for the same event. (If for example I iterated over a temporary copy of the container, I would need to wait for the event to get triggered a second time before changes start taking effect.)

blhsing
  • 91,368
  • 6
  • 71
  • 106
benjimin
  • 4,043
  • 29
  • 48

2 Answers2

3

You can customize the behavior of list by subclassing it with an appropriate implementation of the remove method that decrements the index the iterator points to when the index being removed is less than the current iterator index:

from weakref import WeakSet

class IterList:
    def __init__(self, lst):
        self.list = lst
        self.index = 0

    def __next__(self):
        if self.index == len(self.list):
            raise StopIteration
        value = self.list[self.index]
        self.index += 1
        return value

class List(list):
    iterators = WeakSet()

    def __iter__(self):
        iterator = IterList(self)
        self.iterators.add(iterator)
        return iterator

    def remove(self, item):
        index = super().index(item)
        for iterator in self.iterators:
            if index < iterator.index:
                iterator.index -= 1
        del self[index]

so that:

container = List((1, 2, 3, 4))
for i in container:
    if i == 2:
        container.remove(1)
    for j in container:
        print(i, j)

outputs:

1 1
1 2
1 3
1 4
2 2
2 3
2 4
3 2
3 3
3 4
4 2
4 3
4 4
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • By turning the list into its own iterator, you are making it impossible to iterate over the list more than once at the same time: `for i in container: for j in container: print(i, j)` will run forever, since the outer loop always resets at the same time the inner one does. – Blckknght Oct 13 '19 at 16:35
  • I was waiting for someone to point that out since I was hesitant to make the solution more complicated than it needs to be when the OP does not appear to have a use case for multiple iterators on the same list. But now that someone has pointed it out, I've gone ahead and updated the answer accordingly. :-) – blhsing Oct 13 '19 at 17:27
  • 1
    Looks good to me! My only further suggestion (again, coming at the cost of even more perhaps unneeded complexity) would be to use a `weakref.WeakSet` so that you don't keep the references to all the iterators alive forever. – Blckknght Oct 13 '19 at 17:32
  • Good suggestion. Although the line `self.list.iterators.remove(self)` was there to do exactly that, using `weakref.WeakSet` would make the explicit reference removal unnecessary. Updated as suggested. Thanks! – blhsing Oct 13 '19 at 17:43
  • 2
    Yeah, the peril with relying upon the iterator removing itself from the list is that it won't work if the iterator is not run to completion. For instance, if there's a `break` statement that stops a `for` loop early. – Blckknght Oct 13 '19 at 17:50
  • Ah, did not think about that scenario. Thanks! – blhsing Oct 13 '19 at 17:52
1

The behavior you're asking about is an implementation detail of the iterator involved. As you noticed, the list_iterator type uses an internal index, so removing an already-visited element causes problems because it changes the indexes of all the later values in the list.

What I suggest is that you don't actually removing any values from the list. Rather, add them to another container, perhaps a set (if they're hashable). This assumes that values are unique. But if they're not, you'll probably have problems removing them from the list with any approach.

container = [1, 2, 3, 4]
removed = set()
for i in container:
    if i not in removed:         # skip values that have been "removed"
        print(i)
        if i == 2:
            removed.add(1)       # since we've already visited 1, this has no real effect
            removed.add(3)       # this does work though, we won't print the 3
            container.append(8)  # additions of new elements work as normal

As the comments suggest, that loop with print out 1, 2, 4, and 8.

Blckknght
  • 100,903
  • 11
  • 120
  • 169