6

When building up a collection inside an Option, each attempt to make the next member of the collection might fail, making the collection as a whole a failure, too. Upon the first failure to make a member, I'd like to give up immediately and return None for the whole collection. What is an idiomatic way to do this in Scala?

Here's one approach I've come up with:

def findPartByName(name: String): Option[Part] = . . .

def allParts(names: Seq[String]): Option[Seq[Part]] =
  names.foldLeft(Some(Seq.empty): Option[Seq[Part]]) {
    (result, name) => result match {
      case Some(parts) =>
        findPartByName(name) flatMap { part => Some(parts :+ part) }
      case None => None
    }
  }

In other words, if any call to findPartByName returns None, allParts returns None. Otherwise, allParts returns a Some containing a collection of Parts, all of which are guaranteed to be valid. An empty collection is OK.

The above has the advantage that it stops calling findPartByName after the first failure. But the foldLeft still iterates once for each name, regardless.

Here's a version that bails out as soon as findPartByName returns a None:

def allParts2(names: Seq[String]): Option[Seq[Part]] = Some(
  for (name <- names) yield findPartByName(name) match {
    case Some(part) => part
    case None => return None
  }
)

I currently find the second version more readable, but (a) what seems most readable is likely to change as I get more experience with Scala, (b) I get the impression that early return is frowned upon in Scala, and (c) neither one seems to make what's going on especially obvious to me.

The combination of "all-or-nothing" and "give up on the first failure" seems like such a basic programming concept, I figure there must be a common Scala or functional idiom to express it.

Dan Getz
  • 8,774
  • 6
  • 30
  • 64
Ben Kovitz
  • 4,920
  • 1
  • 22
  • 50

5 Answers5

5

The return in your code is actually a couple levels deep in anonymous functions. As a result, it must be implemented by throwing an exception which is caught in the outer function. This isn't efficient or pretty, hence the frowning.

It is easiest and most efficient to write this with a while loop and an Iterator.

def allParts3(names: Seq[String]): Option[Seq[Part]] = {
  val iterator = names.iterator
  var accum = List.empty[Part]
  while (iterator.hasNext) {
    findPartByName(iterator.next) match {
      case Some(part) => accum +:= part
      case None => return None
    }
  }
  Some(accum.reverse)
}

Because we don't know what kind of Seq names is, we must create an iterator to loop over it efficiently rather than using tail or indexes. The while loop can be replaced with a tail-recursive inner function, but with the iterator a while loop is clearer.

wingedsubmariner
  • 13,350
  • 1
  • 27
  • 52
  • +1 However, I would like to point out that 'clearer' is subjective and many would find the recursive version far more clear, myself included. I would optimize using this approach when needed though. – drstevens May 28 '14 at 16:15
  • @som-snytt I don't know about all uses of return in `scala.collection`, but with javap I can confirm that `List`, `Vector`, and `Seq$` in `scala.collection.immutable` don't use the exception-throwing return. – wingedsubmariner May 28 '14 at 18:09
4

Scala collections have some options to use laziness to achieve that.

You can use view and takeWhile:

def allPartsWithView(names: Seq[String]): Option[Seq[Part]] = {
    val successes = names.view.map(findPartByName)
                              .takeWhile(!_.isEmpty)
                              .map(_.get)
                              .force
    if (!names.isDefinedAt(successes.size)) Some(successes)
    else None
}

Using ifDefinedAt avoids potentially traversing a long input names in the case of an early failure.

You could also use toStream and span to achieve the same thing:

def allPartsWithStream(names: Seq[String]): Option[Seq[Part]] = {
    val (good, bad) = names.toStream.map(findPartByName)
                                    .span(!_.isEmpty)
    if (bad.isEmpty) Some(good.map(_.get).toList)
    else None
}

I've found trying to mix view and span causes findPartByName to be evaluated twice per item in case of success.

The whole idea of returning an error condition if any error occurs does, however, sound more like a job ("the" job?) for throwing and catching exceptions. I suppose it depends on the context in your program.

Dan Getz
  • 8,774
  • 6
  • 30
  • 64
  • Interesting point that bailing out in the middle of a computation is sort of a non-functional idea. Now I'm thinking that it's got to be done with either an exception (non-functional), mutable state as in @som-snytt's answer (non-functional), or recursion where the bailing-out is done by non-strict evaluation as in Aldo Stracquodanio's answer (functional). So my question comes down to: what's a nice way in Scala to write code that expands into that recursive form—or what's an equivalent non-functional idiom that "goes with the grain" of Scala? – Ben Kovitz May 28 '14 at 19:33
  • Thanks @som-snytt; I changed it to not traverse the input more than necessary (and added a second way to achieve the same thing). However, don't forget that it's expected that the input *will* be traversed, because you can't return the first success until you're certain that there are no failures. Only if there happens to be a failure can you avoid traversing. – Dan Getz May 29 '14 at 09:57
  • @DanGetz Converting to `Stream` seems truest to the spirit of "exploit non-strict evaluation without explicitly spelling out the recursion." I wondered how much overhead this introduces, though. So I [benchmarked](http://j.mp/1mzVPo1) all of the answers. The results are interesting and surprising. `Stream` is by far the slowest: 8x slower than wingersubmariner's `Iterator` version, which is the fastest. Thoughtless but purely functional brute force outperformed several of the answers. Early `return` did pretty well. – Ben Kovitz May 30 '14 at 19:16
3

Combining the other answers, i.e., a mutable flag with the map and takeWhile we love.

Given an infinite stream:

scala> var count = 0
count: Int = 0

scala> val vs = Stream continually { println(s"Compute $count") ; count += 1 ; count }
Compute 0
vs: scala.collection.immutable.Stream[Int] = Stream(1, ?)

Take until a predicate fails:

scala> var failed = false
failed: Boolean = false

scala> vs map { case x if x < 5 => println(s"Yup $x"); Some(x) case x => println(s"Nope $x"); failed = true; None } takeWhile (_.nonEmpty) map (_.get)
Yup 1
res0: scala.collection.immutable.Stream[Int] = Stream(1, ?)

scala> .toList
Compute 1
Yup 2
Compute 2
Yup 3
Compute 3
Yup 4
Compute 4
Nope 5
res1: List[Int] = List(1, 2, 3, 4)

or more simply:

scala> var count = 0
count: Int = 0

scala> val vs = Stream continually { println(s"Compute $count") ; count += 1 ; count }
Compute 0
vs: scala.collection.immutable.Stream[Int] = Stream(1, ?)

scala> var failed = false
failed: Boolean = false

scala> vs map { case x if x < 5 => println(s"Yup $x"); x case x => println(s"Nope $x"); failed = true; -1 } takeWhile (_ => !failed)
Yup 1
res3: scala.collection.immutable.Stream[Int] = Stream(1, ?)

scala> .toList
Compute 1
Yup 2
Compute 2
Yup 3
Compute 3
Yup 4
Compute 4
Nope 5
res4: List[Int] = List(1, 2, 3, 4)
som-snytt
  • 39,429
  • 2
  • 47
  • 129
  • Is this solving the right problem? It looks like this is truncating a collection upon reaching failure rather than returning `None` for the whole collection upon reaching failure. – Ben Kovitz May 28 '14 at 17:56
  • What we really need is a lazy `foldRight` - `collectWhile` would be easy to write in terms of this. This whole question would be trivial in Haskell. – wingedsubmariner May 28 '14 at 18:15
  • 1
    There's free beer with Haskell, but its the side-effect-free kind you can't get drunk off of :) – wingedsubmariner May 28 '14 at 19:46
  • @wingedsubmariner How would you do it in Haskell? Maybe that will suggest a nice way to do it in Scala. – Ben Kovitz May 29 '14 at 10:02
  • @BenKovitz What som-snytt is doing is basically how you can fake the Haskell solution in Scala. Haskell's laziness means that `foldr` effectively works left-to-right, with the passed-in function receiving the current element and a continuation (thunk in Haskell-speak) representing the result of `foldr` on the rest of the sequence. You can choose whether to allow this continuation to be evaluated or not. – wingedsubmariner May 29 '14 at 17:54
2

I think your allParts2 function has a problem as one of the two branches of your match statement will perform a side effect. The return statement is the not-idiomatic bit, behaving as if you are doing an imperative jump.

The first function looks better, but if you are concerned with the sub-optimal iteration that foldLeft could produce you should probably go for a recursive solution as the following:

def allParts(names: Seq[String]): Option[Seq[Part]] = {
  @tailrec
  def allPartsRec(names: Seq[String], acc: Seq[String]): Option[Seq[String]] = names match {
    case Seq(x, xs@_*) => findPartByName(x) match {
      case Some(part) => allPartsRec(xs, acc +: part)
      case None => None
    }
    case _ => Some(acc)
  }

  allPartsRec(names, Seq.empty)
}

I didn't compile/run it but the idea should be there and I believe it is more idiomatic than using the return trick!

Aldo Stracquadanio
  • 6,167
  • 1
  • 23
  • 34
  • 1
    allPartsRec(xs, part::acc) might be slightly more performant, because adding an element at the head is O(1). – Ashalynd May 28 '14 at 15:11
  • Yes, I wasn't sure if it would be possible to use the cons operator on Seq. In case of List collections I would definitely use it. – Aldo Stracquadanio May 28 '14 at 15:13
  • What is the side-effect? – Ben Kovitz May 28 '14 at 15:18
  • The jump caused by the return is what I am referring to as a side-effect. In general a side-effect is anything that prevents from seeing your code as an evaluable expression; in this case I would define the jump this way as it introduce a control-flow rather than an expression evaluation by breaking out of the for expression. Probably my terminology is not the most accurate in this case, you can refer to some articles about this as the following on wikipedia: http://en.wikipedia.org/wiki/Side_effect_(computer_science) – Aldo Stracquadanio May 28 '14 at 15:24
  • The `return` is not a side-effect. It's actually implemented as an thrown exception, not a jump. In so far as an exception can be considered a return value (or functions that throw an exception can be trivially rewritten to return an `Either`) it is in fact still a pure function as well. – wingedsubmariner May 28 '14 at 15:30
1

I keep thinking that this has to be a one- or two-liner. I came up with one:

def allParts4(names: Seq[String]): Option[Seq[Part]] = Some(
  names.map(findPartByName(_) getOrElse { return None })
)

Advantage:

  • The intent is extremely clear. There's no clutter and there's no exotic or nonstandard Scala.

Disadvantages:

  • The early return violates referential transparency, as Aldo Stracquadanio pointed out. You can't put the body of allParts4 into its calling code without changing its meaning.

  • Possibly inefficient due to the internal throwing and catching of an exception, as wingedsubmariner pointed out.

Sure enough, I put this into some real code, and within ten minutes, I'd enclosed the expression inside something else, and predictably got surprising behavior. So now I understand a little better why early return is frowned upon.

This is such a common operation, so important in code that makes heavy use of Option, and Scala is normally so good at combining things, I can't believe there isn't a pretty natural idiom to do it correctly.

Aren't monads good for specifying how to combine actions? Is there a GiveUpAtTheFirstSignOfResistance monad?

Ben Kovitz
  • 4,920
  • 1
  • 22
  • 50
  • Could you get the best of both worlds with how clear this code is, vs. how it can't be copy-pasted into a function, by making it a more generic function that takes `findPartByName` as an argument? – Dan Getz May 29 '14 at 10:02
  • And I think throwing and catching exceptions / `Either` / `Try` are sort of like the `GiveUpAtTheFirstSignOfResistance` monad. – Dan Getz May 29 '14 at 10:06
  • Hiding the code in a generic function would certainly hide the early return, and naming that function something like `mapAllOrNone` would make the most readable code of all. I'd still like to see a single, clearly idiomatic way to do it, since there are always small tweaks for ordering, creating a `Set`, etc., and I don't really want to implement my own, custom library of these things. – Ben Kovitz May 30 '14 at 15:49