UPDATE #2: Travis Brown answered another question using Scalaz-streams, an interesting package that might be helpful to you here. I am just starting to look at the package, but was quickly able to use it to read data from a file containing this:
abc
def
ghi
jkl
mno
pqr
and produce another file that looked like this:
Vector(abc, def, )
Vector(ghi, jkl, mno, )
Vector(pqr)
The library only holds the Vector being accumulated in memory. Here's my code (which should be considered dangerous, as I barely know anything about Scalaz-streams):
import scalaz.stream._
io.linesR("/tmp/a")
.pipe( process1.chunkBy(_.nonEmpty) )
.map( _.toString + "\n" )
.pipe(text.utf8Encode)
.to( io.fileChunkW("/tmp/b") )
.run.run
Key to your task is the chunkBy(_.nonEmpty)
, which accumulates lines into a Vector until it hits an empty line. I have no idea at this point why you have to say run twice.
Old stuff below.
UPDATE #1: Ah! I just discovered the new constraint that it not all be read into memory. This solution isn't for you, then; you'd want Iterators or Streams.
I'm guessing that you'd want to enrich Traversable. And with the function in a separate argument list, the compiler can infer the types. For performance you would probably only want to make one pass over the data. And to avoid crashing with large datasets (and for performance), you wouldn't want any recursion that is not tail-recursion. Given this enricher:
implicit class EnrichedTraversable[A]( val xs:Traversable[A] ) extends AnyVal {
def splitWhere( f: A => Boolean ) = {
@tailrec
def loop( xs:Traversable[A], group:Seq[A], groups:Seq[Seq[A]] ):Seq[Seq[A]] =
if ( xs.isEmpty ) {
groups :+ group
} else {
val x = xs.head
val rest = xs.tail
if ( f(x) ) loop( rest, Vector(), groups :+ group )
else loop( rest, group :+ x, groups )
}
loop( xs, Vector(), Vector() )
}
}
you can do this:
List("a","b","","c","d") splitWhere (_.isEmpty)
Here are some tests you might want to check out, to be sure the semantics are what you want (I personally like splits to behave this way):
val xs = List("a","b","","d","e","","f","g") //> xs : List[String] = List(a, b, "", d, e, "", f, g)
xs splitWhere (_.isEmpty) //> res0: Seq[Seq[String]] = Vector(Vector(a, b), Vector(d, e), Vector(f, g))
List("a","b","") splitWhere (_.isEmpty) //> res1: Seq[Seq[String]] = Vector(Vector(a, b), Vector())
List("") splitWhere (_.isEmpty) //> res2: Seq[Seq[String]] = Vector(Vector(), Vector())
List[String]() splitWhere (_.isEmpty) //> res3: Seq[Seq[String]] = Vector(Vector())
Vector("a","b","","c") splitWhere (_.isEmpty) //> res4: Seq[Seq[String]] = Vector(Vector(a, b), Vector(c))