3

For example, consider a small file.

one
two
three

four
five

six
seven
eight
nine

I would like to write code that would take a line iterator it: Iterator[String] and make an iterator sectionIt: Iterator[Seq[String]] that iterates over the sections.

In C# and Ruby this is easily accomplished with the yield keyword. There's talk of how to add that keyword to scala, but it depends on compiler plugins.

One way to create sectionIt would be to create an Iterator[Seq[String]] directly and override next and hasNext. This approach seems tedious and state-intensive for a higher-level language like Scala.

I realize there are other abstractions for streaming data, such as Iteratees, which may make this easier, but that's not an easy sell to someone who is learning a new language.

What is a good approach to writing the above code in Scala?

Community
  • 1
  • 1
schmmd
  • 18,650
  • 16
  • 58
  • 102

2 Answers2

2

You can accomplish most of what you would want with Ruby or C#'s yield using Stream:

def splitOnBlankLines(iter: Iterator[String]): Iterator[Seq[String]] = {
  def asStream(list: List[String]): Stream[List[String]] = {
    if (iter.hasNext) {
      val line = iter.next()
      if (line == "")
        list.reverse #:: asStream(Nil)
      else
        asStream(line :: list)
    } else {
      list.reverse #:: Stream.empty
    }
  }
  asStream(Nil).iterator
}

Whenever we would want to yield, we use #:: with the value we want to return (list.reverse in this case) and an expression representing the rest of the stream. #:: takes this expression as a by-name parameter, so it doesn't execute until the rest of the Stream is needed. When returning the last value, we use Stream.empty to signify that no more values will be produced.

It's possible to combine this behavior of Stream with the continuations plugin to get something syntactically equivalent to Ruby or C#'s yield (in all of twenty lines of code), but the continuations plugin is not likely to ever become stable.

However, manually writing an Iterator is almost as simple:

import scala.annotation.tailrec

class BlankLineSplittingIterator(iter: Iterator[String]) extends Iterator[Seq[String]] {
  def hasNext = iter.hasNext
  def next = {
    if (!iter.hasNext)
      Iterator.empty.next
    @tailrec def untilBlank(list: List[String]): List[String] = {
      val line = iter.next()
      if (line == "" || !iter.hasNext)
        list.reverse
      else
        untilBlank(line :: list)
    }
    untilBlank(Nil)
  }
}
wingedsubmariner
  • 13,350
  • 1
  • 27
  • 52
2

A slightly different version of the other answer:

  def section(it: Iterator[String]): Iterator[Seq[String]] = { 
    def spanned(it: Iterator[String]): Stream[Seq[String]] =
      if (!it.hasNext) Stream.empty
      else { val (a, b) = it span (_ != "") ; a.toSeq #:: spanned(b drop 1) }
    spanned(it).iterator
  }

It's a bit lazier and the behavior around reading between the blank lines is different:

scala> lazysplit.Test.splitOnBlankLines(f"%n%n%n%n%n".lines).size
res0: Int = 6

scala> lazysplit.Test.section(f"%n%n%n%n%n".lines).size
res1: Int = 5
som-snytt
  • 39,429
  • 2
  • 47
  • 129
  • Awesome. Using span makes a lot of sense for this problem. I had been missing that could construct iterators using streams. – schmmd May 21 '14 at 14:31