14

In a text file I have data in the form:

1)
text
text
2)
more text
3)
even more text
more even text
even more text
...

I read it as a list of Strings using the following:

val input = io.Source.fromFile("filename.txt").getLines().toList

I want to break the list down into sub-lists starting with 1), 2) etc.

I've come up with:

val subLists =
  input.foldRight( List(List[String]()) ) {
    (x, acc) =>
      if (x.matches("""[0-9]+\)""")) List() :: (x :: acc.head) :: acc.tail
      else (x :: acc.head) :: acc.tail
  }.tail

Can this be achieved more simply? What would be really nice would be if there were a built-in method to split a collection on every element that satisfies a predicate (hint, hint, library designers :)).

kiritsuku
  • 52,967
  • 18
  • 114
  • 136
Luigi Plinge
  • 50,650
  • 20
  • 113
  • 180
  • 1
    Take a look at this question and the accepted answer: http://stackoverflow.com/questions/6800737/how-to-group-a-variable-length-repeating-sequence-in-scala – mpilquist Sep 03 '11 at 14:30
  • It's possible using Iterators as in that answer, but this case is more complex because each heading is different, so you'd need a second Iterator / List for the headings, and it stops being elegant. Recursion seems much cleaner. – Luigi Plinge Sep 04 '11 at 15:08

2 Answers2

26

foldRight with a complicated argument is usually an indication that you might as well write this using recursion, and factor it out to its own method, while you are at it. Here's what I came up with. First, let's generalize to a generic method, groupPrefix:

 /** Returns shortest possible list of lists xss such that
  *   - xss.flatten == xs
  *   - No sublist in xss contains an element matching p in its tail
  */
 def groupPrefix[T](xs: List[T])(p: T => Boolean): List[List[T]] = xs match {
   case List() => List()
   case x :: xs1 => 
     val (ys, zs) = xs1 span (!p(_))
     (x :: ys) :: groupPrefix(zs)(p)  
 }

Now you get the result simply by calling

 groupPrefix(input)(_ matches """\d+\)""")
Martin Odersky
  • 20,470
  • 9
  • 51
  • 49
  • 1
    One problem: this won't work for a large number of groups (stack overflow) – Jaka Jančar Mar 06 '13 at 16:21
  • list containing 476k elements with 10000 delimiters blows the stack – Conrad.Dean Apr 09 '15 at 19:37
  • Is this a tail recursive method? https://scastie.scala-lang.org/toshetah/hUQHUQjxSHqmUqLP8pCKcA/5 – Tomer Shetah Dec 06 '20 at 21:23
  • It's not tail-recursive because the function call depends on the `::` operation before returning its value (and therefore using more stack space at every iteration). However it is relatively straightforward to make it tail-recursive by adding an additional argument to the function, often called the accumulator. – 6infinity8 Dec 10 '20 at 19:34
1

I have the honor, to add an answer next to the great @MartinOdersky!

From Scala 2.13 we can use the List.unfold:

List.unfold(input) {
  case Nil =>
    None
  case x :: as =>
    as.span(!_.matches("""\d+\)""")) match {
      case (prefix, Nil) =>
        Some(x :: prefix, List.empty)
      case (prefix, suffix) =>
        Some(x :: prefix, suffix)
    }
}

Code run at Scastie.

Tomer Shetah
  • 8,413
  • 7
  • 27
  • 35