7

I have an iterator of lines from a very large file that need to be put in groups as I move along. I know where each group ends because there is a sentinel value on the last line of each group. So basically I want to write a function that takes an iterator and a sentinel value, and returns an iterator of groups each terminated by the sentinel value. Something like:

scala> groups("abc.defg.hi.jklmn.".iterator, '.')
res1: Iterator[Seq[Char]] = non-empty iterator

scala> groups("abc.defg.hi.jklmn.".iterator, '.').toList
res19: List[Seq[Char]] = List(List(a, b, c, .), List(d, e, f, g, .), List(h, i, .), List(j, k, l, m, n, .))

Note that I want the sentinel items included at the end of each of the groups. Here's my current solution:

def groups[T](iter: Iterator[T], sentinel: T) = new Iterator[Seq[T]] {                   
  def hasNext = iter.hasNext
  def next = iter.takeWhile(_ != sentinel).toList ++ List(sentinel)
}

I think this will work, and I guess it is fine, but having to re-add the sentinel every time gives me a code smell. Is there a better way to do this?

Steve
  • 3,038
  • 2
  • 27
  • 46
  • Did you want a sentinel added to the last group if it didn't contain it? (e.g "abc.def" -> ["abc.","def."]) – Mitch Blevins Jul 12 '10 at 19:48
  • Ideally no, though practically I think it doesn't matter. – Steve Jul 12 '10 at 20:37
  • It so happens that I have wanted, and asked for, a `takeTo` (plus `dropTo` and `spanTo`), which would act just like `takeWhile`, but return one more element -- the first one for which the predicate is true. If you feel like me, you could drop a note here: https://lampsvn.epfl.ch/trac/scala/ticket/2963 – Daniel C. Sobral Jul 13 '10 at 02:21

2 Answers2

6

Less readable than yours, but more "correct" when final group doesn't have a terminating sentinel value:

def groups[T](iter: Iterator[T], sentinel: T) = new Iterator[Seq[T]] {
 def hasNext = iter.hasNext
 def next: Seq[T] = {
     val builder = scala.collection.mutable.ListBuffer[T]()
     while (iter.hasNext) {
       val x = iter.next
       builder.append(x)
       if (x == sentinel) return builder
     }
     builder
 }
}

Or, recursively:

  def groups[T](iter: Iterator[T], sentinel: T) = new Iterator[Seq[T]] {
    def hasNext = iter.hasNext
    def next: Seq[T] = {
      @scala.annotation.tailrec
      def build(accumulator: ListBuffer[T]): Seq[T] = {
        val v = iter.next
        accumulator.append(v)
        if (v == sentinel || !iter.hasNext) => accumulator
        else build(accumulator)
      }
      build(new ListBuffer[T]())
    }
  }
Mitch Blevins
  • 13,186
  • 3
  • 44
  • 32
3

Ugly, but should be more performant than your solution:

  def groups[T](iter: Iterator[T], sentinel: T) = new Iterator[Seq[T]] {                   
    def hasNext = iter.hasNext
    def next = iter.takeWhile{
      var last = null.asInstanceOf[T]
       c => { val temp = last; last = c; temp != sentinel}
     }.toList
  }
Landei
  • 54,104
  • 13
  • 100
  • 195
  • Wow, that's ugly, but cool. =) You can move the "var last" out to a private variable, and then it looks a little less ugly. – Steve Jul 13 '10 at 15:28