14

I have a list of possible input Values

val inputValues = List(1,2,3,4,5)

I have a really long to compute function that gives me a result

def reallyLongFunction( input: Int ) : Option[String] = { ..... }

Using scala parallel collections, I can easily do

inputValues.par.map( reallyLongFunction( _ ) )

To get what all the results are, in parallel. The problem is, I don't really want all the results, I only want the FIRST result. As soon as one of my input is a success, I want my output, and want to move on with my life. This did a lot of extra work.

So how do I get the best of both worlds? I want to

  1. Get the first result that returns something from my long function
  2. Stop all my other threads from useless work.

Edit - I solved it like a dumb java programmer by having

@volatile var done = false;

Which is set and checked inside my reallyLongFunction. This works, but does not feel very scala. Would like a better way to do this....

axel22
  • 32,045
  • 9
  • 125
  • 137
bwawok
  • 14,898
  • 7
  • 32
  • 43
  • 1
    Side note (not an answer to your question): this is IMHO simpler: `inputValues.par.map(reallyLongFunction)` – Tomasz Nurkiewicz Dec 11 '11 at 22:42
  • 1
    Similar: http://stackoverflow.com/questions/8073061/filtering-scalas-parallel-collections-with-early-abort-when-desired-number-of-r – Luigi Plinge Dec 12 '11 at 00:16
  • It doesn't look like to me parallel collections or the fork-join framework were designed to handle this case. If the computation is long because it's CPU intensive, it seems wasteful to want to compute all results or split the load between the cores versus putting all the cores working to computing a result. If the computation is long because it's waiting for some IO, it seems future or actors would be more appropriate. – huynhjl Dec 12 '11 at 03:24
  • Well for each input, its a purely single threaded computation that takes ~30 seconds of CPU time per input. About the perfect case to split up the work with fork join, IF there was a cleaner way to abort on first successful answer. – bwawok Dec 12 '11 at 03:29
  • @bwawok, I may be misunderstanding your use case. It feels it would amount to the same as this: given n cores, taking n inputs, kick off n computations and wait for the first one to finish. So the whole business about splitting tasks and stealing work from other queues does not come into play... – huynhjl Dec 12 '11 at 04:23
  • @huynhji not the first one to finish, the first one to successfully finish (most computations will return None. I want the first with Some) – bwawok Dec 12 '11 at 14:28

3 Answers3

4

(Updated: no, it doesn't work, doesn't do the map)

Would it work to do something like:

inputValues.par.find({ v => reallyLongFunction(v); true })

The implementation uses this:

  protected[this] class Find[U >: T](pred: T => Boolean, protected[this] val pit: IterableSplitter[T]) extends Accessor[Option[U], Find[U]] {
    @volatile var result: Option[U] = None
    def leaf(prev: Option[Option[U]]) = { if (!pit.isAborted) result = pit.find(pred); if (result != None) pit.abort }
    protected[this] def newSubtask(p: IterableSplitter[T]) = new Find(pred, p)
    override def merge(that: Find[U]) = if (this.result == None) result = that.result
  }

which looks pretty similar in spirit to your @volatile except you don't have to look at it ;-)

Havoc P
  • 8,365
  • 1
  • 31
  • 46
  • How do I get back the result of reallylongFunction? Not sure I understand this syntax, humm – bwawok Dec 12 '11 at 00:50
  • 1
    oh, I screwed it up; find of course returns the original value not the computed one. nevermind this answer! – Havoc P Dec 12 '11 at 00:58
  • @HavocP - I ran into that problem several times, too :( Why Scala does not have something like findMap[B](fn: A => (B, Boolean)) defined at it's collections? – Rogach Dec 12 '11 at 04:30
  • As I understand it, `view` is the reason a "findMap" function is not needed. But I just tried it mixed with `par` and it seems to lose its laziness, so maybe it wouldn't work here. – Luigi Plinge Dec 12 '11 at 05:27
3

I took interpreted your question in the same way as huynhjl, but if you just want to search and discardNones, you could do something like this to avoid the need to repeat the computation when a suitable outcome is found:

class Computation[A,B](value: A, function: A => B) {
  lazy val result = function(value)
}

def f(x: Int) = {          // your function here
  Thread.sleep(100 - x)
  if (x > 5) Some(x * 10)
  else None
}

val list = List.range(1, 20) map (i => new Computation(i, f))  
val found = list.par find (_.result.isDefined) 
  //found is Option[Computation[Int,Option[Int]]]
val result = found map (_.result.get)
  //result is Option[Int]

However find for parallel collections seems to do a lot of unnecessary work (see this question), so this might not work well, with current versions of Scala at least.

Volatile flags are used in the parallel collections (take a look at the source for find, exists, and forall), so I think your idea is a good one. It's actually better if you can include the flag in the function itself. It kills referential transparency on your function (i.e. for certain inputs your function now sometimes returns None rather than Some), but since you're discarding the stopped computations, this shouldn't matter.

Community
  • 1
  • 1
Luigi Plinge
  • 50,650
  • 20
  • 113
  • 180
  • I really like the idea of storing a lazy result in a find, and then pulling it out with a map. I can't quite make this compile, because my "f" function takes 2 other params in addition to the i param (not related to what I am splitting on, and constant across all invocations).. so need to figure that out from a syntax POV. Maybe I should curry it... – bwawok Dec 12 '11 at 19:16
  • @bwawok `new Computation((arg1,arg2,arg3), (f _).tupled)` will work without any modification to the `Computation` class, assuming `f` takes 3 arguments. Or, you could make computation classes of different arity. – Luigi Plinge Dec 12 '11 at 20:38
2

If you're willing to use a non-core library, I think Futures would be a good match for this task. For instance:

...both of which appear to enable the functionality you're looking for.

Greg Campbell
  • 15,182
  • 3
  • 44
  • 45
  • I don't want first completed, I want first completed with result – bwawok Dec 12 '11 at 18:37
  • "find" exists in the upcoming Akka 2.0, but until then it's fairly easy to implement: https://github.com/jboner/akka/blob/master/akka-actor/src/main/scala/akka/dispatch/Future.scala#L211 – Viktor Klang Dec 12 '11 at 21:22