25

I have multiple iterators which return items in a sorted manner according to some sorting criterion. Now, I would like to merge (multiplex) the iterators into one, combined iterator. I know how to do it in Java style, with e.g. tree-map, but I was wondering if there is a more functional approach? I want to preserve the laziness of the iterators as much as possible.

therealrootuser
  • 10,215
  • 7
  • 31
  • 46
Bober02
  • 15,034
  • 31
  • 92
  • 178
  • possible duplicate of [How to combine 2 Iterators in Scala?](http://stackoverflow.com/questions/9047856/how-to-combine-2-iterators-in-scala) – om-nom-nom May 01 '13 at 15:47

3 Answers3

43

You can just do:

val it = iter1 ++ iter2

It creates another iterator and does not evaluate the elements, but wraps the two existing iterators. It is fully lazy, so you are not supposed to use iter1 or iter2 once you do this.

In general, if you have more iterators to merge, you can use folding:

val iterators: Seq[Iterator[T]] = ???
val it = iterators.foldLeft(Iterator[T]())(_ ++ _)

If you have some ordering on the elements that you would like to maintain in the resulting iterator but you want lazyness, you can convert them to streams:

def merge[T: Ordering](iter1: Iterator[T], iter2: Iterator[T]): Iterator[T] = {
  val s1 = iter1.toStream
  val s2 = iter2.toStream

  def mergeStreams(s1: Stream[T], s2: Stream[T]): Stream[T] = {
    if (s1.isEmpty) s2
    else if (s2.isEmpty) s1
    else if (s1.head < s2.head) s1.head #:: mergeStreams(s1.tail, s2)
    else s2.head #:: mergeStreams(s1, s2.tail)
  }

  mergeStreams(s1, s2).iterator
}

Not necessarily faster though, you should microbenchmark this.

A possible alternative is to use buffered iterators to achieve the same effect.

axel22
  • 32,045
  • 9
  • 125
  • 137
  • OK, how do I make sure that the relative ordering according to the same sorting criteria is maintained? Let's say I have an object that has a timestamp in a form of `DateTime`. I would like these two iterators to be merged accroding to timestamps, not one after the other (in Java I would use comparator) – Bober02 May 01 '13 at 09:14
  • Thanks, but I defeinitely dont want to use streams as they cache the elements. Besides, can I provide Ordering on the actual elements e.g. like in Java Comparator that you can pass to collections as an argument? – Bober02 May 01 '13 at 10:38
  • This was done in the example by using the `Ordering` context bound on `T`. The `<` comes from an implicit conversion to a value class, added in 2.10., but you could still use the `compare` method of the `Ordering` otherwise. – axel22 May 01 '13 at 11:08
  • The memory footprint complexity and the asymptotic running time complexity stay the same with streams - it's only the absolute performance that could be worse. – axel22 May 01 '13 at 11:09
  • "The memory footprint complexity and the asymptotic running time complexity stay the same with streams" - http://stackoverflow.com/questions/5159000/stream-vs-views-vs-iterators this begs to differ in terms of memory complexity – Bober02 May 01 '13 at 11:40
  • The values are cached with streams in general, indeed, but if you drop the reference to the `head` of the stream, then those values can be reclaimed by the garbage collector. If you take a look at the stream iterator implementation you will see that this is exactly what is being done (https://github.com/scala/scala/blob/v2.10.1/src/library/scala/collection/immutable/Stream.scala#L961). – axel22 May 01 '13 at 11:51
  • Right, but if I merge the streams, the final stream will have already precached every element from one stream and the next... I cannot afford that as the memory would blow up my machine. I need sth to lazily generate another iterator, without having to scan the two iterators and generate the final iterator – Bober02 May 01 '13 at 12:47
  • No, it won't -- unless there is a bug in my implementation above that I am not seeing, the streams should avoid evaluating the elements that you have not yet iterated over. In particular, calling `toStream` above on the `iter1` and `iter2` is fully lazy. – axel22 May 01 '13 at 12:51
  • Hmm... What about mergeStreams? This ieterates through streams, building a new one, by iterating through all elements. when you call `mergeStreams()` does that not get evaluated to one stream in order to provide, later on, iterator over it? SOrry for far too many comments... – Bober02 May 01 '13 at 14:09
  • The `mergeStreams` only evaluates the heads of `s1` and `s2` -- their tails are never traversed. When you construct a new stream using `#::` the `mergeStreams` method is not immediately called recursively -- it is called only when somebody calls `tail` on the resulting stream. – axel22 May 01 '13 at 14:13
  • Hmmm, any idea how to functionally extend it to N iterators? – Bober02 May 02 '13 at 17:08
  • Not that it would be particularly efficient, but you could always fold using the `mergeStreams` method. A custom `Iterator` implementation might be much more efficient, though. – axel22 May 02 '13 at 18:19
4

Like @axel22 mentioned, you can do this with BufferedIterators. Here's one Stream-free solution:

def combine[T](rawIterators: List[Iterator[T]])(implicit cmp: Ordering[T]): Iterator[T] = {
  new Iterator[T] {
    private val iterators: List[BufferedIterator[T]] = rawIterators.map(_.buffered)

    def hasNext: Boolean = iterators.exists(_.hasNext)

    def next(): T = if (hasNext) {
      iterators.filter(_.hasNext).map(x => (x.head, x)).minBy(_._1)(cmp)._2.next()
    } else {
      throw new UnsupportedOperationException("Cannot call next on an exhausted iterator!")
    }
}
sralmai
  • 197
  • 1
  • 5
3

You could try:

(iterA ++ iterB).toStream.sorted.toIterator

For example:

val i1 = (1 to 100 by 3).toIterator
val i2 = (2 to 100 by 3).toIterator
val i3 = (3 to 100 by 3).toIterator

val merged = (i1 ++ i2 ++ i3).toStream.sorted.toIterator

merged.next  // results in: 1
merged.next  // results in: 2
merged.next  // results in: 3
Keith Pinson
  • 1,725
  • 1
  • 14
  • 17