1

TL;DR is what I'm trying to do too complicated for a yield-based generator?

I have a python application where I need to repeat an expensive test on a list of objects, one at a time, and then mangle those that pass. I expect several objects to pass, but I do not want to create a list of all those that pass, as mangle will alter the state of some of the other objects. There is no requirement to test in any particular order. Then rinse and repeat until some stop condition.

My first simple implementation was this, which runs logically correctly

while not stop_condition:
    for object in object_list:
        if test(object):
            mangle(object)
            break
    else:
        handle_no_tests_passed()

unfortunately, for object in object_list: always restarts at the beginning of the list, where the objects probably haven't been changed, and there are objects at the end of the list ready to test. Picking them at random would be slightly better, but I would rather carry on where I left off from the previous for/in call. I still want the for/in call to terminate when it's traversed the entire list.

This sounded like a job for yield, but I tied my brain in knots failing to make it do what I wanted. I can use it in the simple cases, iterating over a range or returning filtered records from some source, but I couldn't find out how to make it save state and restart reading from its source.

I can often do things the long wordy way with classes, but fail to understand how to use the alleged simplifications like yield. Here is a solution that does exactly what I want.

class CyclicSource:
    def __init__(self, source):
        self.source = source
        self.pointer = 0

    def __iter__(self):
        # reset how many we've done, but not where we are
        self.done_this_call = 0
        return self

    def __next__(self):
        ret_val = self.source[self.pointer]
        if self.done_this_call >= len(self.source):
            raise StopIteration
        self.done_this_call += 1
        self.pointer += 1
        self.pointer %= len(self.source)
        return ret_val

source = list(range(5))
q = CyclicSource(source)

print('calling once, aborted early')
count = 0
for i in q:
    count += 1
    print(i)
    if count>=2:
        break
else:
    print('ran off first for/in')

print('calling again')
for i in q:
    print(i)
else:
    print('ran off second for/in')

which demonstrates the desired behaviour

calling once, aborted early
0
1
calling again
2
3
4
0
1
ran off second for/in

Finally, the question. Is it possible to do what I want with the simplified generator syntax using yield, or does maintaining state between successive for/in calls require the full class syntax?

Neil_UK
  • 1,043
  • 12
  • 25
  • I am not sure I follow you. Is there any reason why you can't just do `for object in random.shuffle(object_list)` to get different ordering of objects every time the loop goes off? https://stackoverflow.com/questions/976882/shuffling-a-list-of-objects – Gnudiff Dec 12 '17 at 12:32
  • ie. together with not breaking the loop after mangling. – Gnudiff Dec 12 '17 at 12:49

1 Answers1

1

Your use of the __iter__ method causes your iterator to be reset. This actually goes quite counter to regular behaviour of an iterator; the __iter__ method should just return self, nothing more. You rely on a side effect of for applying iter() to your iterator each time you create a for i in q: loop. This makes your iterator work, but the behaviour is surprising and will trip up future maintainers. I'd prefer that effect to be split out to a separate .reset() method, for example.

You can reset a generator too, using generator.send() to signal it to reset:

def cyclic_source(source):
    pointer = 0
    done_this_call = 0

    while done_this_call < len(source):
        ret_val = source[pointer]
        done_this_call += 1
        pointer = (pointer + 1) % len(source)
        reset = yield ret_val
        if reset is not None:
            done_this_call = 0
            yield  # pause again for next iteration sequence

Now you can 'reset' your count back to zero:

q = cyclic_source(source)
for count, i in enumerate(q):
    print(i)
    if count == 1:
        break
else:
    print('ran off first for/in')

print('explicitly resetting the generator')
q.send(True)
for i in q:
    print(i)
else:
    print('ran off second for/in')

This is however, rather.. counter to readability. I'd instead use an infinite generator by using itertools.cycle() that is limited in the number of iterations with itertools.islice():

from itertools import cycle, islice

q = cycle(source)
for count, i in enumerate(islice(q, len(source))):
    print(i)
    if count == 1:
        break
else:
    print('ran off first for/in')

for i in islice(q, len(source)):
    print(i)
else:
    print('ran off second for/in')

q will produce values from source in an endless loop. islice() cuts off iteration after len(source) elements. But because q is reused, it is still maintaining the iteration state.

If you must have a dedicated iterator, stick to a class object and make an iterable, so have it return a new iterator each time __iter__ is called:

from itertools import cycle, islice

class CyclicSource:
    def __init__(self, source):
        self.length = len(source)
        self.source = cycle(source)

    def __iter__(self):
        return islice(self.source, self.length)

This keeps state in the cycle() iterator still, but simply creates a new islice() object each time you create an iterator for this. It basically encapsulates the islice() approach above.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thx for the reply, more food for thought. I'm working at the very limits of my python_foo here, and I'm not clear why, or even that, my (apparently) misuse of the side effects of for is surprising. The prototype for that generator class was got from 'somewhere on the net' so it might be of low quality. However, I thought that call init on object create, iter on for/in call and next on the next time round the loop was well defined. It sounds like the itertools constructs should be more sanitary. I's still like to make something that can be used in a simple for/in, than do an enumerate round it. – Neil_UK Dec 12 '17 at 14:21
  • @Neil_UK: doing anything else but `return self` in an iterator `__iter__` method is surprising, because that's not something any other iterator does. `iter(iterator)` should produce `iterator`, nothing else, and certainly not have altered the state of the iterator. – Martijn Pieters Dec 12 '17 at 14:28
  • @Neil_UK: I'd much rather see an explicit `.reset()` method on the iterator that you call to set the counter back to 0. That is better than either the side-effect in `__iter__` or the generator function using sending (which, having to add an extra `yield` to pause again, is also a bit obscure). – Martijn Pieters Dec 12 '17 at 14:30
  • @Neil_UK: actually, I've added another option; one that properly creates a new iterator for each `iter()` invocation, one that starts a new limited loop over `cycle`. – Martijn Pieters Dec 12 '17 at 14:33
  • Thx, much tidier, works fine. Handy to get a leg up into itertools. – Neil_UK Dec 12 '17 at 15:57
  • Followed up your link on what iter() should do. Not trying to argue, just understand, I'm on a steep learning curve. It just requires the return of an iterator, doesn't appear to exclude other operations. Your final class implementation doesn't (appear to?) return self from the iter. Both yours and mine work, and appear fairly obvious and self-documenting, but I'll use yours as it's shorter and tidier. Is it only through precedent/style that mine is surprising, or is there a PEP/other documentation that positively prohibits anything other than a simple return of self from iter? – Neil_UK Dec 13 '17 at 06:26
  • @Neil_UK: my final implementation is a *iterable*, not an iterator. The difference is that the former returns a *new* iterator each time you call `__iter__`, and it doesn't itself have a `__next__` method. My `__iter__` method returns a new generator, which is a iterator, each time you call it. Yours is surprising through precedent / style, and by experience. – Martijn Pieters Dec 13 '17 at 18:45
  • Hmmmm, thanks, I'll have to let that sink in for a couple of years. – Neil_UK Dec 13 '17 at 19:03