Python: Chaining iterators without changing the original instance

Question

This questions goes in the direction of this question: How to join two generators (or other iterables) in Python?

But I'm searching for a solution that keeps the original instance of an iterator?

Something that does the following:

iterator1=range(10)
iterator2=range(10)

iterator_chained=keep_instance_chain_left(iterator1,iterator2)
assert iterator2 is iterator_chained #!!!!

The iterator1 should be extendleft the original iterator2.

Can anybody give an replacement for keep_instance_chain_left() ?

The solution:

iterator_chained=itertools.chain(iterator1,iterator2)

Delivers just a new instance of an iterator:

assert iterator_chained is not iterator2

it is not possible to extend an existing iterator using `itertools.chain()`. Iterator in Python is essentially a pointer to a specific location in a sequence, and once that pointer has moved beyond a certain point, it cannot be moved back. Maybe write new custom class to keeps track of both iterator — dincer.unal, Mar 31 '23 at 08:16
By _keeps the original instance of an iterator_ you mean keep the same object (i.e. keep same id)? Is that right? Or do you mean something else - e.g. not consume the original iterator? And if yes - why? — buran, Mar 31 '23 at 08:17
@buran Yes, I mean that preserving the same object (i.e., same id) for the iterator. Such as if the original iterator is being tracked by some other data structure or is being passed around to other functions, those structures or functions may not work correctly if the iterator's identity changes. iterator may be used multiple thah in the other codes and you can change its identity, when youbreak other parts of the code that depend on the original identity. — dincer.unal, Mar 31 '23 at 08:24
An iterator (not an iterable) is for one-time use only. You cannot consume a cake/iterator and still have it as before. Maybe the related `itertools.tee` can help - it depends on what you are trying to achieve. — VPfB, Mar 31 '23 at 08:28
@dincer.unal, my question was pointed at OP, I quoted part of their question - _But I'm searching for a solution that **keeps the original instance of an iterator**?_. IMHO, this smells very much like XY problem — buran, Mar 31 '23 at 08:28
@buran I understand yes OP is asking about preserving the identity of the iterator as part of an XY problem. Preserving the identity of an iterator can be important in some cases, as I mentioned in my previous response, but it's not always necessary or the best approach. It's possible that there is another way to achieve the underlying goal that the OP has in mind, without needing to preserve the iterator's identity. — dincer.unal, Mar 31 '23 at 08:39
Note that `itertools.tee` is strictly worse than just using `list` if you're going to fully exhaust one tee iterator before using the other - it only helps if the two tees are going to remain fairly close together in the data stream. — user2357112, Mar 31 '23 at 12:55

B.R. · Answer 1 · 2023-04-03T06:47:57.290

Thank you for your comments and ideas I created the following solution witch fits to my use case.

class MutableIterator():

    __slots__=('_chain','_iterator','_depth')

    def __init__(self,*iterators):
        self._chain=itertools.chain
        self._depth=0
        s=len(iterators)
        if s==0:
            self._iterator=iter(()) #empty iterator
        elif s==1:
            self._iterator =iter(iterators[0])
        else:
            self._iterator=self._chain(*iterators)

    def __next__(self):
        yield next(self._iterator)

    def __iter__(self):
        try:
            while 1:
                yield next(self._iterator)
        except StopIteration:
            pass

    def append(self,iterator):
        if self._depth>20000: 
        # maximum depth of c-level recursions possible we  must consume here the iterator
            self._iterator=self._chain(list(self._iterator),iter(iterator))
            self._depth=0
        else:
            self._iterator = self._chain(self._iterator, iter(iterator))
            self._depth +=1

    def appendleft(self,iterator):
        if self._depth>20000: 
        # maximum depth of c-level recursions possible we  must consume here the iterator
            self._iterator = self._chain(iterator, list(self._iterator))
            self._depth=0
        else:
            self._iterator = self._chain(iterator, self._iterator)
            self._depth +=1

E.g. it delivers the output:

a=[[1,2,3],[4,5,6]]
my_iterator=MutableIterator(a)
for i in my_iterator:
    if type(i) is list:
        my_iterator.appendleft(iter(i))
    print(i,end=', ')


[1, 2, 3], 1, 2, 3, [4, 5, 6], 4, 5, 6,

Even that it works I'm not 100% satisfied.

It would be nice to have a build-in solution for this problem
In fact in __iter__() I replace the for loop by a while loop which is not so nice from my point of few.
From time to time the iterator must be consumed internally to avoid recursion erros in c-level of python (I must remark that in my use case depth > 1000 will not come up). But with this code the depth is unlimited.

I made a test against a solution based on collections.deque:

class MutableIteratorDeque():

    __slots__=('_iterator')

    def __init__(self,*iterators):
        s=len(iterators)
        if s==0:
            self._iterator=deque(()) #empty iterator
        elif s==1:
            self._iterator =deque((iter(iterators[0]),))
        else:
            self._iterator=deque((iter(i.__iter__()) for i in iterators))

    def __next__(self):
        while 1:
            try:
                yield next(self._iterator[0])
            except StopIteration:
                try:
                    self._iterator.popleft()
                except IndexError:
                    raise StopIteration

    def __iter__(self):

            while 1:
                try:
                    yield next(self._iterator[0])
                except StopIteration:
                    self._iterator.popleft()
                except IndexError:
                    break


    def append(self,iterator):
        self._iterator.append(iter(iterator))

    def appendleft(self,iterator):
        self._iterator.appendleft(iter(iterator))

Here my testing functions:

item_number=10000000
deeplist=[]
sub=deeplist
for i in range(item_number):
    sub.append([1])
    sub=sub[-1]

flatlist=list(range(item_number))

def iter1():
    global deeplist
    my_iterator=MutableIterator(deeplist)
    for i in my_iterator:
        if type(i) is list:
            my_iterator.appendleft(iter(i))

def iter1b():
    global flatlist
    my_iterator=MutableIterator(flatlist)
    for i in my_iterator:
        if type(i) is list:
            my_iterator.appendleft(iter(i))


def iter2():
    global deeplist
    my_iterator=MutableIteratorDeque(deeplist)
    for i in my_iterator:
        if type(i) is list:
            my_iterator.appendleft(iter(i))

def iter2b():
    global flatlist
    my_iterator=MutableIteratorDeque(flatlist)
    for i in my_iterator:
        if type(i) is list:
            my_iterator.appendleft(iter(i))

print('Used size of list: %i'%item_number)
print('MutableIterator via chain() deeplist: %fs'%timeit.timeit(iter1,number=1))
print('MutableIterator via deque() deeplist: %fs'%timeit.timeit(iter2,number=1))

print('MutableIterator via chain() flatlist: %fs'%timeit.timeit(iter1b,number=1))
print('MutableIterator via deque() flatlist: %fs'%timeit.timeit(iter2b,number=1))

I got following results based on Python 3.9 (64Bit):

Used size of list: 10000000
MutableIterator via chain() deeplist: 4.338570s
MutableIterator via deque() deeplist: 4.664340s
MutableIterator via chain() flatlist: 0.802791s
MutableIterator via deque() flatlist: 0.912932s

This means yes for sure the iteration time increases if we have deeper structures. Here itertools.chain behaves a bit faster as a solution based on collections.dequeeven that the iterator must be internally consumed from time to time to avoid recursion erros.

But we can also say the difference in between the two solutions is not really large.

After the input from @Kelly Bundy we can see that there are nested structures where the creation of the chain()-objects seams to be to "costly", so overall it might be recommend to use the deque solution:

deeplist = []
for i in range(10000):
    deeplist = [deeplist]
deeplist += [0] * 10000


print('MutableIterator via chain() deeplist: %fs'%timeit.timeit(iter1,number=1))
print('MutableIterator via deque() deeplist: %fs'%timeit.timeit(iter2,number=1))

Result:

MutableIterator via chain() deeplist: 0.907174s
MutableIterator via deque() deeplist: 0.004194s

Such nested chains could end up being quite slow if nested deeply. — Kelly Bundy, Mar 31 '23 at 23:45
I made some tests and even for lists with nested items to a depth up to 1000 levels the performance is quite good. I also like to mention here that the solution is not recursive this means we do not get RecursionErrors even for deeper nested situations. @Kelly Bundy do you have a better solution? — B.R., Apr 02 '23 at 18:59
Iteration with depth 1000 is over a hundred times slower than depth 0 in my test. It's not Python recursion but C recursion, for me it segfaults if I append ~270000 times and then iterate. I changed it to use a deque of iterators instead, but then realized it's not a proper iterator and wasn't in the mood to fix that. Maybe if you fix yours, I'll adjust equally. — Kelly Bundy, Apr 02 '23 at 19:20
What times do you get with [this](https://tio.run/##K6gsycjPM7YoKPr/PyU1tSAns7hEwVYhOpYrLb9IIVMhM0@hKDEvPVXD0AAINK24FIAAWSGMHcsFF9UGChvEKmgpgPX8/w8A)? — Kelly Bundy, Apr 02 '23 at 22:51
Since you just misled some other answerer, let me be clearer. These are *not* iterators. The `__iter__` would need to return `self`, and `__next__` would have to `return` elements, not `yield` them. You just don't notice the latter issue because your `__next__` is never used, due to your improper `__iter__`. You should decide whether you want an iterator or a non-iterator iterable, and implement your choice properly. Right now it's a non-iterator iterable with a wrong name and an unused `__next__`. — Kelly Bundy, Apr 03 '23 at 13:11

B.R. · Accepted Answer · 2023-04-03T20:25:32.493

I like to add the following improved solution based on collections.deque. The solution uses in __iter__() an internal for loop which improves in case of unchanged iterations the speed.

Beside this I have now adapted the __next__() method so that it should work now (I did not focus on it before). It should be clear that the class mocks an iterator and it is not a real iterator.

from collections import deque

class MutableIteratorDeque():

    __slots__=('_iterator')

    def __init__(self,*iterators):
        self._iterator=deque((iter(i) for i in iterators))

    def __next__(self):
        iterator=self._iterator
        while iterator:
            try:
                return next(iterator[0])
            except StopIteration:
                iterator.popleft()
        raise StopIteration

    def __iter__(self):
        iterator=self._iterator # make local
        while iterator:
            it=iterator[0]
            for i in it:
                yield i
                if it is not iterator[0]:
                    break
            else:
                iterator.popleft()

    def append(self,iterator):
        self._iterator.append(iter(iterator))

    def appendleft(self,iterator):
        self._iterator.appendleft(iter(iterator))

This code delivers the following speed results compared to the "chained" solution (see my previous answer):

Used size of list: 10000000
MutableIterator via chain() deeplist: 4.588935s
MutableIterator via deque() deeplist: 3.647782s
MutableIterator via chain() flatlist: 0.806678s
MutableIterator via deque() flatlist: 0.756402s

And with the last nested structure:

MutableIterator via chain() deeplist: 0.889283s
MutableIterator via deque() deeplist: 0.003150s

For me this is the best solution I found at the moment. I still would like to see a buildin-solution (maybe in future).

Thank you again for your help and the discussion.

dincer.unal · Answer 3 · 2023-04-03T07:16:36.597

this example takes two iterators and keeps tracks of both of them. When the iterators are exhausted, they stop.

class ChainedIterator:
    def __init__(self, iter1, iter2):
        self.iter1 = iter1
        self.iter2 = iter2
        self.current_iter = iter1

    def __iter__(self):
        for it in self.iterators:
            yield from it       

    def __next__(self):
        try:
            return next(self.current_iter)
        except StopIteration:
            if self.current_iter is self.iter1:
                self.current_iter = self.iter2
                return next(self.current_iter)
            else:
                raise StopIteration


iterator1 = range(10)
iterator2 = range(10)

iterator_chained = ChainedIterator(iterator2, iterator1)
assert iterator_chained.current_iter is iterator2

score -2 · Answer 4 · edited Apr 03 '23 at 16:21

Unfortunately, it's not possible to achieve what you're looking for directly in Python. Once an iterator is exhausted, there's no way to "rewind" it to its original state.

However, there is a workaround that involves creating a custom iterator class that keeps track of the original iterator and the new iterator created by itertools.chain(). Here's an example implementation:

import itertools

class ChainedIterator:
    def __init__(self, iterable1, iterable2):
        self.original_iterable = iterable1
        self.chained_iterable = itertools.chain(iterable1, iterable2)
        self.current_iterable = self.original_iterable

    def __iter__(self):
        return self

    def __next__(self):
        try:
            return next(self.current_iterable)
        except StopIteration:
            if self.current_iterable is self.original_iterable:
                self.current_iterable = self.chained_iterable
                return next(self.current_iterable)
            else:
                raise StopIteration

    def __getattr__(self, attr):
        return getattr(self.current_iterable, attr)

With this implementation, you can create a ChainedIterator instance that wraps around your original iterators:

iterator1 = range(10)
iterator2 = range(10)
iterator_chained = ChainedIterator(iterator1, iterator2)

Now, iterator_chained behaves just like a regular iterator, except that it switches to the chained iterator once the original iterator is exhausted:

assert iterator_chained is not iterator2  # This will pass
for i in iterator_chained:
   print(i)

This will output:

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

Python: Chaining iterators without changing the original instance

4 Answers4