16

Title says it all, really; iterating over collection while preserving state between loops and finishing iteration based on termination condition in addition to simply running out of elements may be the most common pattern to accomplish anything in imperative programming. It seems to me however like it's something functional gentleprogrammers agreed to not talk about, or at least I never encountered an idiom for it or a semi-standarized name such as with map, fold, reduce, etc.

I often use the followinig code in scala:

implicit class FoldWhile[T](private val items :Iterable[T]) extends AnyVal {
    def foldWhile[A](start :A)(until :A=>Boolean)(op :(A, T)=>A) :A = {
        if (until(start)) start
        else {
            var accumulator = start
            items.find{ e => accumulator = op(accumulator, e); until(accumulator) }
            accumulator
        }

    }

}

But it's ugly. Whenever I try a more declarative approach, I come with even longer and almost surely slower code, akin to:

Iterator.iterate((start, items.iterator)){
    case (acc, i) if until(acc) => (acc, i)
    case (acc, i) if i.hasNext => (op(acc, i.next()), i)
    case x => x
}.dropWhile {
    case (acc, i) => !until(acc) && i.hasNext
}.next()._1

(A more functional variant would use Lists or Streams, but iterators have arguably lesser overhead than converting items to a Stream, as default implementation for the latter uses an iterator underneath anyway).

My questions are:

1) Does this concept have a name in functional programming, and if so, what is the pattern associated with its implementation?

2) What would be the best (i.e. concise, generic, lazy, and with least overhead) way to implememnt it in scala?

Turin
  • 2,208
  • 15
  • 23
  • 1
    I've never understood why this doesn't have a standard implementation. Yes. tail recursion is the way to do it but it's a bit ugly (and requires a helper function for which one needs to find a name, which always seems a bit of a code smell to me). .`mapUntil` and `foldLeftUntil` etc seem useful things to me... – The Archetypal Paul Apr 09 '16 at 17:48

4 Answers4

11

This is frowned upon by scala purists, but you can use a return statement like this:

 def foldWhile[A](zero: A)(until:A => Boolean)(op:  (A,T) => A): A = items.fold(zero) {
      case (a, b) if until(a) => return a
      case (a,b) => op(a, b)
}

Or, if you are one of those frowning, and would like a purely functional solution without dirty imperative tricks, you can use something lazy, like an iterator or a stream:

items
  .toStream // or .iterator - it doesn't really matter much in this case
  .scanLeft(zero)(op)
  .find(until)
Dima
  • 39,570
  • 6
  • 44
  • 70
  • 1
    Haha, I forgot scala has a return statement :) I think that in a two-line method it is excusable and actually both more efficient and readable than using a var. To cite a comment from source of one of scala collections next to a downcast "it's an imperfect world, but at least we can bottle up the imperfection". And the one liner with scan is certainly worth remembering, too - thanks! – Turin Apr 09 '16 at 23:14
6

The functional way of doing such things is via Tail Recursion:

implicit class FoldWhile[T](val items: Iterable[T]) extends AnyVal {

  def foldWhile[A](zero: A)(until: A => Boolean)(op: (A, T) => A): A = {
    @tailrec def loop(acc: A, remaining: Iterable[T]): A =
      if (remaining.isEmpty || !until(acc)) acc else loop(op(acc, remaining.head), remaining.tail)

    loop(zero, items)
  }
}

Using recursion you can decide at each step if you want to proceed or not without using break and without any overhead, because tail recursions are converted to iterations from the compiler.

Also, pattern matching is often used to decompose sequences. For example, if you had a List you could do:

implicit class FoldWhile[T](val items: List[T]) extends AnyVal {

  def foldWhile[A](zero: A)(until: A => Boolean)(op: (A, T) => A): A = {
    @tailrec def loop(acc: A, remaining: List[T]): A = remaining match {
      case Nil              => acc
      case _ if !until(acc) => acc
      case h :: t           => loop(op(acc, h), t)
    }

    loop(zero, items)
  }
}

Scala has the @scala.annotation.tailrec annotation to force compilation to fail if the function you're annotating is not tail recursive. I suggest you use it as much as you can because it helps both to avoid errors and document the code.

Community
  • 1
  • 1
Giovanni Caporaletti
  • 5,426
  • 2
  • 26
  • 39
  • In pure functional language that would be probably the most practical answer. I didn't use recursion however here in purpose, as `tail` can be a very slow operation with some collections (even with Vector its far from ideal) and I believe (maybe wrongly) that generally the prefered way to do any iteration over non-concrete collection type is to use in-built methods; any external iteration, including recursion, needs to go through an additional iterator/stream conversion layer and validation with every method call, which would be generally.skipped when executing a callback. – Turin Apr 09 '16 at 16:06
  • 1
    And frankly I secretly held hope to hear some term from Category Theory that would open new horizons (such as with Monoids and reduce) ;) – Turin Apr 09 '16 at 16:09
  • @Turin You're right in saying that the specific implementation of the recursion depends on the collection. The one I showed you is for list-like collection with fast head-tail decomposition. But generally, recursion is what lets you decide if you want to stop at each step, how you implement the step is up to you (e.g. head-tail decomposition, a mutable iterator, whatever). I'll wait for some "reduced monoid" as well, I'm definitely not a hardcore functional programmer :D – Giovanni Caporaletti Apr 09 '16 at 16:37
  • Also note that the "built-in" methods internally use procedural code. If you want the same degree of efficiency and zero overhead/intermediate collections you can always use while/break, but that is basically the same as using tail recursion. – Giovanni Caporaletti Apr 09 '16 at 16:52
2

A right fold, when done lazily, can do early termination. In Haskell, for example, you can write the find function (return first element of list that satisfies predicate) with foldr:

find :: (a -> Bool) -> [a] -> Maybe a
find p = foldr (\a r -> if p a then Just a else r) Nothing

-- For reference:
foldr :: (a -> r -> r) -> r -> [a] -> r
foldr _ z [] = []
foldr f z (a:as) = f a (foldr f z as)

What happens when you try, say, find even [1..]? (Note that this is an infinite list!)

find even [1..]
  = foldr (\a r -> if even a then Just a else r) Nothing [1..]
  = if even 1 
    then Just 1 
    else foldr (\a r -> if even a then Just a else r) Nothing ([2..])
  = if False 
    then Just 1 
    else foldr (\a r -> if even a then Just a else r) Nothing ([2..])
  = foldr (\a r -> if even a then Just a else r) Nothing ([2..])
  = if even 2 
    then Just 2 
    else foldr (\a r -> if even a then Just a else r) Nothing ([3..])
  = if True 
    then Just 2 
    else foldr (\a r -> if even a then Just a else r) Nothing ([3..])
  = Just 2

Laziness means that the function that we fold with (\a r -> if even a then Just a else r) gets to decide whether to force the r argument—the one whose evaluation requires us to recurse down the list—at all. So when even 2 evaluates to True, we pick the branch of the if ... then ... else ... that discards the result computed off the tail of the list—which means we never evaluate it. (It also runs in constant space as well. While programmers in eager functional languages learn to avoid foldr because of space and termination issues, those aren't always true in lazy languages!)

This of course hinges on the fact that Haskell is lazily evaluated, but it should nevertheless be possible to simulate this in an eager language like Scala—I do know it has a lazy val feature that might be usable for this. It looks like you'd need to write a lazyFold function that does a right fold, but the recursion happens inside a lazy value. You might still have problems with space usage, though.

Luis Casillas
  • 29,802
  • 7
  • 49
  • 102
  • Thanks, it certainly opened my eyes to the usefulness of foldr; out of the box it is much less useful than foldl in scala, lacking laziness of haskell but xoming with stackoverlow bomb. While it will certainly be more ugly than in Haskell, it certainly could be simulated in scala. – Turin Apr 16 '16 at 01:42
  • Ok, got back to it in the morning and it now seems to me it doesn't do what my initial script, as the stop predicate accepts an element of the collection rather than the accumulator. I don't think there is a way to both have a cake and eat it with foldr, i.e. both have the value of an accumulator at some element and avoid evaluating it for the rest. Going down the stack we have no info on the result of the accumulator, and going back we already traversed the whole collection and have to bactrace through the whole stack anyway. – Turin Apr 16 '16 at 13:56
2

The functional name for this is an Iteratee.

There are a bunch of references about this, but it's probably better to start from where the design ended up by reading the Pipes Tutorial and only if you're interested working backwards from there to see how it came from an early terminating left fold.

Kyle Butt
  • 9,340
  • 3
  • 22
  • 15