31

I have two iterators, a list and an itertools.count object (i.e. an infinite value generator). I would like to merge these two into a resulting iterator that will alternate yield values between the two:

>>> import itertools
>>> c = itertools.count(1)
>>> items = ['foo', 'bar']
>>> merged = imerge(items, c)  # the mythical "imerge"
>>> merged.next()
'foo'
>>> merged.next()
1
>>> merged.next()
'bar'
>>> merged.next()
2
>>> merged.next()
Traceback (most recent call last):
    ...
StopIteration

What is the simplest, most concise way to do this?

David Eyk
  • 12,171
  • 11
  • 63
  • 103
  • 1
    Don't use this one folks: `list((yield next(c)) or i for i in items)` – Chris_Rands Jul 12 '17 at 12:21
  • 2
    This isn't what OP is looking for, but it's the first result upon googling "merge iterators python," so I figured I would comment: If you're looking for a mergesort-type function that merges two sorted iterators into one longer sorted iterator, use [`heapq.merge`](https://docs.python.org/3.8/library/heapq.html#heapq.merge). – Dennis Dec 09 '19 at 02:01

13 Answers13

46

A generator will solve your problem nicely.

def imerge(a, b):
    for i, j in itertools.izip(a,b):
        yield i
        yield j
Pramod
  • 9,256
  • 4
  • 26
  • 27
  • 11
    You should add a disclaimer - this will only work if list a is finite. – Claudiu Oct 28 '08 at 16:17
  • import itertools def imerge(a, b): for i, j in zip(a,b): yield i yield j c = itertools.count(1) items = ['foo', 'bar'] for i in imerge(c, items): print i I'm trying this, and this still works. len(zipped list) = min(l1, l2). So the finite length restriction is not required. – Pramod Oct 28 '08 at 16:22
  • 2
    Claudiu is correct. Try zipping two infinite generators--you will run out of memory eventually. I would prefer using itertools.izip instead of zip. Then you build the zip as you go, instead of all at once. You still have to watch out for infinite loops, but hey. – David Eyk Oct 28 '08 at 16:40
  • 2
    It will still only work if one of the arguments is a finite iterable. If they are both infinite, zip() won't work. Use itertools.izip() instead. – Thomas Wouters Oct 28 '08 at 16:40
  • Hm. I should say, Claudiu is partially correct. If either a *or* b is finite, this will work. The problem only enters in when both a *and* b are infinite, and you get an infinite loop. This is just something we have to watch out for when using infinite generators. – David Eyk Oct 28 '08 at 16:43
  • 1
    This would be my accepted answer, as I like the concise clarity of it. However, I balk at the use of zip instead of itertools.izip. – David Eyk Oct 28 '08 at 16:49
  • Right. I concede that its not ideal for 2 infinite length lists. I thought the question was about only a being finite. I'm changing it to using itertools.izip. – Pramod Oct 28 '08 at 16:57
  • 1
    If you're writing any generator that calls zip() and iterates over its result, the odds are pretty good that you really want to be calling itertools.izip(). Whether or not the iterables you're zipping are finite. – Robert Rossney Oct 28 '08 at 18:28
  • 15
    In Python 3.0 zip() behaves like itertools.izip(). – jfs Dec 01 '08 at 21:47
  • 2
    Can somebody clarify for the noobs like me that we will be able to handle reading a finite number of elements out of two infinite generators if we use `izip`? e.g. This is the primary reason for `izip` to exist, yes? – Steven Lu Jun 05 '13 at 15:18
16

You can do something that is almost exaclty what @Pramod first suggested.

def izipmerge(a, b):
  for i, j in itertools.izip(a,b):
    yield i
    yield j

The advantage of this approach is that you won't run out of memory if both a and b are infinite.

Constantin
  • 27,478
  • 10
  • 60
  • 79
David Locke
  • 17,926
  • 9
  • 33
  • 53
15

I also agree that itertools is not needed.

But why stop at 2?

  def tmerge(*iterators):
    for values in zip(*iterators):
      for value in values:
        yield value

handles any number of iterators from 0 on upwards.

UPDATE: DOH! A commenter pointed out that this won't work unless all the iterators are the same length.

The correct code is:

def tmerge(*iterators):
  empty = {}
  for values in itertools.zip_longest(*iterators, fillvalue=empty):
    for value in values:
      if value is not empty:
        yield value

and yes, I just tried it with lists of unequal length, and a list containing {}.

Tom Swirly
  • 2,740
  • 1
  • 28
  • 44
12

I'd do something like this. This will be most time and space efficient, since you won't have the overhead of zipping objects together. This will also work if both a and b are infinite.

def imerge(a, b):
    i1 = iter(a)
    i2 = iter(b)
    while True:
        try:
            yield i1.next()
            yield i2.next()
        except StopIteration:
            return
Claudiu
  • 224,032
  • 165
  • 485
  • 680
  • The try/except here breaks the iterator protocol by muffling the StopIteration, doesn't it? – David Eyk Oct 28 '08 at 16:46
  • @David Eyk: it's OK, because returning from a generator raises StopIteration anyway. The try statement in this case is superfluous. – efotinis Oct 28 '08 at 18:00
11

You can use zip as well as itertools.chain. This will only work if the first list is finite:

merge=itertools.chain(*[iter(i) for i in zip(['foo', 'bar'], itertools.count(1))])
Claudiu
  • 224,032
  • 165
  • 485
  • 680
6

I prefer this other way which is much more concise:

iter = reduce(lambda x,y: itertools.chain(x,y), iters)
vampolo
  • 107
  • 1
  • 4
4

One of the less well known features of Python is that you can have more for clauses in a generator expression. Very useful for flattening nested lists, like those you get from zip()/izip().

def imerge(*iterators):
    return (value for row in itertools.izip(*iterators) for value in row)
Petr Viktorin
  • 65,510
  • 9
  • 81
  • 81
  • 1
    Definitely would work, though I find nested generator expressions less than readable. I'd use this style if I were worried about performance. – David Eyk Mar 30 '11 at 17:07
  • It is really concise, as Python often is, but how does one begin to see what this code does? What is the effect of `value for row in ...` followed by `for value in row`? Isn't this a nested list-comprehension-generator? shouldn't it end with something like `for rowvalue in row` or is `value` shadowed? – Steven Lu Jun 05 '13 at 15:20
  • @StevenLu Basically it's two nested loops, like this: `for row in itertools.izip(*iterators): for value in row: yield value` – Petr Viktorin Jun 06 '13 at 11:31
3

I'm not sure what your application is, but you might find the enumerate() function more useful.

>>> items = ['foo', 'bar', 'baz']
>>> for i, item in enumerate(items):
...  print item
...  print i
... 
foo
0
bar
1
baz
2
John Fouhy
  • 41,203
  • 19
  • 62
  • 77
  • I always forget about enumerate! What a useful little tool, though it won't work in my particular application. Thanks! – David Eyk Oct 29 '08 at 03:54
3

Here is an elegant solution:

def alternate(*iterators):
    while len(iterators) > 0:
        try:
            yield next(iterators[0])
            # Move this iterator to the back of the queue
            iterators = iterators[1:] + iterators[:1]
        except StopIteration:
            # Remove this iterator from the queue completely
            iterators = iterators[1:]

Using an actual queue for better performance (as suggested by David):

from collections import deque

def alternate(*iterators):
    queue = deque(iterators)
    while len(queue) > 0:
        iterator = queue.popleft()
        try:
            yield next(iterator)
            queue.append(iterator)
        except StopIteration:
            pass

It works even when some iterators are finite and others are infinite:

from itertools import count

for n in alternate(count(), iter(range(3)), count(100)):
    input(n)

Prints:

0
0
100
1
1
101
2
2
102
3
103
4
104
5
105
6
106

It also correctly stops if/when all iterators have been exhausted.

If you want to handle non-iterator iterables, like lists, you can use

def alternate(*iterables):
    queue = deque(map(iter, iterables))
    ...
user76284
  • 1,269
  • 14
  • 29
  • An interesting approach. :) So many ways to do this. I wonder if a rotating `deque()` would be more efficient than rebuilding the tuple on every iteration? – David Eyk Nov 09 '16 at 01:12
1

Use izip and chain together:

>>> list(itertools.chain.from_iterable(itertools.izip(items, c))) # 2.6 only
['foo', 1, 'bar', 2]

>>> list(itertools.chain(*itertools.izip(items, c)))
['foo', 1, 'bar', 2]
A. Coady
  • 54,452
  • 8
  • 34
  • 40
1

A concise method is to use a generator expression with itertools.cycle(). It avoids creating a long chain() of tuples.

generator = (it.next() for it in itertools.cycle([i1, i2]))
0

Why is itertools needed?

def imerge(a,b):
    for i,j in zip(a,b):
        yield i
        yield j

In this case at least one of a or b must be of finite length, cause zip will return a list, not an iterator. If you need an iterator as output then you can go for the Claudiu solution.

Andrea Ambu
  • 38,188
  • 14
  • 54
  • 77
  • I prefer an iterator, as I'm reading values from files of arbitrary size. I'm sure there are cases where zip is superior. – David Eyk Oct 29 '08 at 03:53
0

Using itertools.izip(), instead of zip() as in some of the other answers, will improve performance:

As "pydoc itertools.izip" shows:

Works like the zip() function but consumes less memory by returning an iterator instead of a list.

Itertools.izip will also work properly even if one of the iterators is infinite.

J. M. Arnold
  • 6,261
  • 3
  • 20
  • 38
user26294
  • 5,532
  • 3
  • 22
  • 18