2

Does anyone know of anything equivalent to itertools.tee but with the ability to dynamically add iterators?

The itertools.tee function does exactly what I want, except that the number of iterators must be fixed when the function is called. I'd like something with equivalent functionality, but which permits new iterators to be added later, potentially even after some of the iterators have started iterating.

I'd like to avoid calling itertools.tee multiple times, as this will potentially use a lot of memory (I could have millions of iterators). From the Python docs for itertools.tee:

The following Python code helps explain what tee does (although the actual implementation is more complex and uses only a single underlying FIFO queue).

...

If new iterators are added later, they only need to see newly-arriving data.

Here's a code example of what I'd like to have (may not be syntactically correct):

input_iterator = iter([1, 2, 3, 4, 5, 6])
tee = EnhancedTee(input_iterator)
it1 = iter(tee)
val = it1.next()
# val == 1
it2 = iter(tee)
val = it2.next()
# val == 2
val = it1.next()
# val == 2
Neil
  • 1,754
  • 2
  • 17
  • 30
  • Show us an example, because as far as i read and know you can still use `tee` – Netwave Aug 30 '17 at 15:36
  • You will have to store all of the elements in the iterator to make this work anyway, so simply convert it to a list. You can create as many iterators as you want for a list using `iter(my_list)`. – Sven Marnach Aug 30 '17 at 15:36
  • 2
    *potentially even after some of the iterators have started iterating*, so why not keep a reference iterator and make new copies from it with `tee` when you need them. – Moses Koledoye Aug 30 '17 at 15:37
  • @MosesKoledoye: Calling tee multiple times will create multiple FIFO queues internally, which will uses lots of memory if I have millions of iterators. From the Python docs for itertools.tee: "The following Python code helps explain what tee does (although the actual implementation is more complex and uses only a single underlying FIFO queue)." – Neil Aug 30 '17 at 15:47
  • @SvenMarnach: Unfortunately converting the iterator to a list will take an infinite amount of time in my case! The iterator is consuming from a queue, which will block if there is no data available. – Neil Aug 30 '17 at 15:49
  • @Neil If new iterators are added later, are they supposed to see the data that the previous iterators have already "spent", or only newly arriving data? For example, imagine starting with `enhanced_tee([10, 20, 30, 40, 50], 2)` and advancing both iterators past 10 and 20. Now you add a third iterator. Does it begin with 10 or with 30? – user4815162342 Aug 30 '17 at 15:51
  • @user4815162342: good question; the newly-added iterators only need to see newly-arriving data. So in your example the third iterator would start at 30. – Neil Aug 30 '17 at 15:52
  • You might want to mention that in the question, along with a mock code sample of how you'd want your class to behave. But I don't see how this would be implemented without some kind of deque-like structure per iterator. If you really have millions of iterators, and each needs to implement the child side of `tee` contract, it will be hard to avoid spending a lot of memory. – user4815162342 Aug 30 '17 at 15:54
  • @user4815162342L: I'll update the question; however, I do believe that it's possible to implement what I want without a deque-like structure per iterator, since this is what `itertools.tee` does, according to the docs. There will certainly be some memory overhead per iterator, but I don't believe it needs to be as much as a deque of all the outstanding values. – Neil Aug 30 '17 at 15:58
  • 1
    @DanielSanchez: I've added a code example to the question – Neil Aug 30 '17 at 16:03
  • You can copy tee iterators with `copy.copy`. The copies will be positioned at whatever point the one you copied was positioned at. – user2357112 Aug 30 '17 at 16:07
  • @user2357112: thanks, that's great. I don't think this is an exact duplicate of the other question, because the answer to the other question starts each copied iterator at the start of the input iterator, but that's not exactly what I want. I was able to adapt the answer to do what I want though, by teeing only a single iterator at the start, and copying all subsequent iterators from that one. – Neil Aug 30 '17 at 16:33
  • @Neil I've now reopened the question so that you can post the answer you've arrived at. – user4815162342 Aug 31 '17 at 10:44
  • @user4815162342: great, thanks – Neil Aug 31 '17 at 15:56

1 Answers1

0

This has almost been answered already in another question: In python, can I lazily generate copies of an iterator using tee?. However, the answer I needed was slightly different to the answer given to this question. I only require the newly-created iterators to see new data, so I only need to tee a single iterator initially, and make copies of this. So the following works well for me:

input_iterator = iter([1, 2, 3, 4, 5, 6])
it1 = itertools.tee(input_iterator, 1)[0]
print(next(it1))
# prints 1
it2 = copy(it1)
print(next(it2))
# prints 2
print(next(it1))
# prints 2
Neil
  • 1,754
  • 2
  • 17
  • 30