Prevent modification of custom class while iterating

Question

If I have a class with the interface:

class AnIteratable(object):

  def __init__(self):
    #initialize data structure

  def add(self, obj):
    # add object to data structure

  def __iter__(self):
    #return the iterator

  def next(self):
    # return next object

...how would I set things up so that if add() is called mid-iteration an exception is thown, similar to:

In [14]: foo = {'a': 1}

In [15]: for k in foo:
   ....:     foo[k + k] = 'ohnoes'
   ....:     
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-15-2e1d338a456b> in <module>()
----> 1 for k in foo:
      2     foo[k + k] = 'ohnoes'
      3 

RuntimeError: dictionary changed size during iteration

Update: If the interface needs more methods, feel free to add them. I've also removed the implementation of __iter__().

Update #2 Based on kindall's answer I mocked up the following psuedo-implementation. Note that _datastruture and associated methods indexing into it are abstractions and the class writer would have to write his/her own data structure traversal and location pointer mechanisms.

class AnIteratable(object):

  def __init__(self):
    self._itercount = 0
    self._datastructure = init_data_structure() #@UndefinedVariable
    # _datastructure, and the methods called on it, are abstractions.

  def add(self, obj):
    if self._itercount:
      raise RuntimeError('Attempt to change object while iterating')
    # add object to data structure

  def __iter__(self):
    self._itercount += 1
    return self.AnIterator(self)

  class AnIterator(object):

    def __init__(self, aniterable):
      self._iterable = aniterable
      self._currentIndex = -1 #abstraction
      self._notExhausted = True

    def next(self):
      if self._iterable._datastructure.hasNext(self._currentIndex):
        self._currentIndex += 1
        return self._iterable._datastructure.next(self._currentIndex)
      else:
        if self._notExhausted:
          self._iterable._itercount -= 1
        self._notExhausted = False
        raise StopIteration

    def __next__(self):
      return self.next()

    # will be called when there are no more references to this object
    def __del__(self): 
      if self._notExhausted:
        self._iterable._itercount -= 1

Update 3 After reading some more, it seems __del__ is probably not the right way to go. The following might be a better solution, although it requires the user to explicitly release a non-exhausted iterator.

    def next(self):
      if self._notExhausted and 
              self._iterable._datastructure.hasNext(self._currentIndex):
      #same as above from here

    def discard(self):
      if self._notExhausted:
        self._ostore._itercount -= 1
      self._notExhausted = False

If `foo` is a `dict` you can use `for k in foo.keys()[:]`. In other case it depends on `next` method implementation. — Alexey Kachayev, Sep 18 '12 at 01:14
@DavidRobinson I'm not - this is just an interface. I'm wondering how you'd do it for an arbitrary class with an arbitrary data structure, as I assume this type of situation is fairly common. — elhefe, Sep 18 '12 at 01:18

score 3 · Answer 1 · answered Sep 18 '12 at 01:35

You shouldn't mix the iterator with the instance. Otherwise what happens when you want to iterate over the instance more than once at a time?

Think about where you are storing the position of the iterator.

Split the iterator into a separate class. Store the size of the object when you create the iterator instance. Check the size whenever next() is called

dicts aren't foolproof either. You can add and remove a key which will screw up the iteration, but won't throw the error

Python 2.7.3 (default, Aug  1 2012, 05:14:39) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> d = {i:i for i in range(3)}
>>> d
{0: 0, 1: 1, 2: 2}
>>> for k in d:
...     d[k+3] = d.pop(k)
...     print d
... 
{1: 1, 2: 2, 3: 0}
{2: 2, 3: 0, 4: 1}
{3: 0, 4: 1, 5: 2}
{4: 1, 5: 2, 6: 0}
{5: 2, 6: 0, 7: 1}
{6: 0, 7: 1, 8: 2}
{7: 1, 8: 2, 9: 0}
{8: 2, 9: 0, 10: 1}
{9: 0, 10: 1, 11: 2}
{10: 1, 11: 2, 12: 0}
{11: 2, 12: 0, 13: 1}
{12: 0, 13: 1, 14: 2}
{13: 1, 14: 2, 15: 0}
{16: 1, 14: 2, 15: 0}
{16: 1, 17: 2, 15: 0}
{16: 1, 17: 2, 18: 0}

Way more than 3 iterations!

score 1 · Answer 2 · edited May 23 '17 at 12:12

1

If the item is indexable and has a length, you can do something like this, which is similar to how dict does it:

class AnIterable(list):

    def __iter__(self):
         n = len(self)
         i = 0
         while i < len(self):
             if len(i) != n:
                 raise RuntimeError("object changed size during iteration")
             yield self[i]
             i += 1

A downside is if the caller makes multiple changes that result in no net change to the length (for example adding, then deleting, an element) it won't be caught. Of course, you could use a revision counter (incremented whenever some other method makes a change) rather than just checking the length:

class AnIterable(object):

    def __init__(self, iterable=()):
        self._content = list(iterable)
        self._rev = 0

    def __iter__(self):
        r = self._rev
        for x in self._content:
            if self._rev != r:
                 raise RuntimeError("object changed during iteration")
            yield x

    def add(self, item):
        self._content.append(item)
        self._rev += 1

This gets messy since you must increment the revision counter in each method that can modify the list. You could write a metaclass or class decorator to automatically write such wrapper methods for a list, I guess.

Another approach would be to keep a count of "live" iterators, incrementing an instance attribute when an iterator is created and decrementing it when it is exhausted. Then in add(), you check to make sure this attribute is zero and raise an exception if not.

class AnIterable(object):

    def __init__(self, iterable=()):
        self._itercount = 0
        self._content   = list(iterable)

    def __iter__(self):
         self._itercount += 1
         try:
             for x in self._content:
                 yield x
         finally:
             self._itercount -= 1

    def add(self, obj):
        if self._itercount:
            raise RuntimeError("cannot change object while iterating")
        self._content.append(obj)

For bonus points, implement __del__() on the iterator so the count is also decremented when an object goes out of scope without being exhausted. (Watch out for double-decrementing!) This will require defining your own custom iterator class, rather than using the one Python gives you when you use yield in a function, and there's of course no guarantee as to when __del__() will be called in any case.

Alas, you can't really stop someone from getting around whatever "protection" you add. We're all consenting adults here.

What you can't do in any case is just use self as your iterator.

Finally, here's an example of a different, more or less opposite, approach: you let the caller make the changes, but defer actually applying them until iteration completes. A context manager is used to finalize the changes explicitly.

To make sure callers use the context manager, you could refuse to iterate if you're not in a context (e.g., check in __iter__() a flag set in __enter__()), then store a list of the iterator objects and invalidate them when exiting the context (e.g. set a flag in each iterator so it raises an exception on the next iteration).

edited May 23 '17 at 12:12

Community

1
1

answered Sep 18 '12 at 01:50

kindall

178,883
35
278
309

I especially like the fact that this approach would detect _any_ changes to the length of a container object (i.e. deletions even though the OP doesn't have an interface for that operation in their sample code). Only problem might be that it appears to assume the container is indexable. – martineau Sep 18 '12 at 02:22
Of course if you add and delete an item in between iterations, this won't catch that... – kindall Sep 18 '12 at 04:53
Regarding the implementation of the live iterator idea: I think you're making the same mistake that I did in an answer (I've since deleted) in that you're assuming that the generator will be consumed until the end. I'm not totally sure but it seems like in that in the case of an interator being abandoned, that `_itercount` wouldn't to be decremented properly. – martineau Sep 18 '12 at 08:44
Yes, the example has that flaw. That's why I said "for bonus points, implement '__del__()'". You'd need to write your own iterator class, which I didn't feel like debugging. :-) – kindall Sep 18 '12 at 13:11
Yeah, I guess that's basically what @gnibbler is saying in his answer (create a separate iterator class). If I have some time and feel so motivated, I might give doing that a shot...it does afterall seem like a fundamental iterator issue that needs a recipe or pattern to follow. – martineau Sep 18 '12 at 17:24
Actually now that I think about it some more, implementing `__del__()` method probably wouldn't work reliably for signally that an iterator has gone out of scope without being exhausted because Python make no guarantees about when, if ever, it'll actually call it, even if you delete the object [explicitly](http://docs.python.org/reference/datamodel.html?highlight=object.__del__#object.__del__) with a `del x`. – martineau Sep 18 '12 at 17:35
@kindall what do you think of the pseudo-implementation in update #2 in the OP? – elhefe Sep 18 '12 at 19:13
@martineau `__del__` always gets called when there are no more references to an object. `del x` won't necessarily cause `x.__del__` to be called because `del` just decrementes the reference count. The situations where `__del__` never gets called are probably(?) unlikely to happen for this case or if they occur it doesn't really matter (e.g. on program exit because there's a reference in the stack trace). – elhefe Sep 18 '12 at 19:16
@elhefe: `__del__()` is called when the instance is about to be destroyed, which is not necessarily right after its reference count goes to zero. My experience has been that this happens when the garbage collector is called (if ever) -- in other words just going out of scope or having a `del obj` is no guarantee that `x.__del__()` is going to be called any time soon. This is different than the way things are with C++ for example, and part of the reason they invented the `with` statement. – martineau Sep 18 '12 at 19:38
@martineau Hmm, based my naive reading of the [language spec](http://docs.python.org/reference/datamodel.html) it sounds like `__del__` is called when the ref count = 0. Going out of scope / `del` wouldn't necessarily call `__del__` if there are more references hanging around. If you're right, though, there doesn't really seem to be any way to make an iterator that can detect changes to the data structure it's iterating over other than by directly querying the data structure, which could be prohibitive in some cases. – elhefe Sep 18 '12 at 20:41
@elhefe: I have first hand experience about it after learning how it works the hard way. The language ref isn't wrong, just a little misleading (or perhaps not specific enough) about exactly when things happen. – martineau Sep 18 '12 at 21:20
@martineau: Well, that sucks. What kind of situation causes delay/disconnect between decrementing the ref count to zero and calling `__del__`? I just did some simple tests and it was always called immediately. – elhefe Sep 18 '12 at 21:46
@martineau: never mind, just read [this](http://stackoverflow.com/questions/1481488/how-to-call-the-del-method). Not having a reliable destructor really makes it hard to make safe iterators, it seems. – elhefe Sep 18 '12 at 22:38
@elhefe: Yeah, I've read that question and ilya n.'s answer to it before (and am one of the many who have upvoted his answer). I noticed I had also upvoted Charles Merriam's answer to the question -- the one that opines "Don't use `__del__`." Some things in Python are like that and just require that certain rules be followed in their usage or they won't work properly (i.e. the behavior can't be automatically or completely enforced for every case). – martineau Sep 19 '12 at 07:30
Hmmm, deferring deletes isn't quite the same thing as outright disallowing them. It could also get a little messy to extend to handle (when deferring) all the possible content changing methods. In fact I'm not sure that even just deferring deletes would work logically in all cases since checks might be made later during the same iteration to say test if something that was deleted was there. No, I'm beginning to think not being able to do this is a big hole (aka wart) in Python. – martineau Sep 20 '12 at 01:46
@martineau Yeah. I don't really think any of the answers are ideal at this point. For my implementation I'm using the code in my OP update #3 for now, although I don't like adding non-standard methods to the iterator interface. – elhefe Sep 20 '12 at 16:30
@elhefe: You might consider making your update #3 into a context manager which ought to make it unnecessary to remember to call `discard()` -- although in general I greatly prefer anything (of course) that "just worked" over one that requires users to follow special access rules, so the closer to that the better. – martineau Sep 20 '12 at 16:56

Prevent modification of custom class while iterating

2 Answers2