I have very large Iterators that I want to split into pieces. I have a predicate that looks at an item and returns true if it is the start of a new piece. I need the pieces to be Iterators, because even the pieces will not fit into memory. There are so many pieces that I would be wary of a recursive solution blowing out your stack. The situation is similar to this question, but I need Iterators instead of Lists, and the "sentinels" (items for which the predicate is true) occur (and should be included) at the beginning of a piece. The resulting iterators will only be used in order, though some may not be used at all, and they should only use O(1) memory. I imagine this means they should all share the same underlying iterator. Performance is important.
If I were to take a stab at a function signature, it would be this:
def groupby[T](iter: Iterator[T])(startsGroup: T => Boolean): Iterator[Iterator[T]] = ...
I would have loved to use takeWhile
, but it loses the last element. I investigated span
, but it buffers results. My current best idea involves BufferedIterator
, but maybe there is a better way.
You'll know you've got it right because something like this doesn't crash your JVM:
groupby((1 to Int.MaxValue).iterator)(_ % (Int.MaxValue / 2) == 0).foreach(group => println(group.sum))
groupby((1 to Int.MaxValue).iterator)(_ % 10 == 0).foreach(group => println(group.sum))