2

in scala, i have a parallel Iterable of items and i want to iterate over them and aggregate the results in some way, but in order. i'll simplify my use case and say that we start with an Iterable of integers and want to concatenate the string representation of them in paralle, with the result in order.

is this possible with either fold or aggregate? it's unclear from the documentation which methods work parallelized but maintain order.

om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
Heinrich Schmetterling
  • 6,614
  • 11
  • 40
  • 56

3 Answers3

4

Yes, order is gauranteed to be preserved for fold/aggregate/reduce operations on parallel collections. This is not very well documented. The trick is that the operation you which to fold over must be associative (and thus capable of being arbitrarily split up and recombined), but need not be commutative (and so not capable of being safely reordered). String concatenation is a perfect example of an associative, non-commutative operation, so the fold can be done in parallel.

val concat = myParallelList.map(_.toString).reduce(_+_)
Dave Griffith
  • 20,435
  • 3
  • 55
  • 76
  • 1
    Is there anything that could apply extra optimizations in case of an associative *and* commutative operation such as `+` or `*`? – Erik Kaplun Mar 31 '14 at 01:46
1

For folds: foldRight and foldLeft cannot be processed in parallel, you'll need to use the new fold method (more info there).

Like fold, aggregate can do its work in parallel: it “traverses the elements in different partitions sequentially” (Scaladoc), though it looks like you have no direct influence on how the partitions are chosen.

Community
  • 1
  • 1
Jean-Philippe Pellet
  • 59,296
  • 21
  • 173
  • 234
  • philippe thanks, i understand that fold or aggregate must be used - fold's documentation indicates that order is not guaranteed. i can't find that information for aggregate. – Heinrich Schmetterling Jun 10 '11 at 08:47
  • When the order is guaranteed, it wouldn't be very parallel, would it? – Jens Schauder Jun 10 '11 at 08:58
  • 2
    @Jens Imagine List("a","b","c","d") and concatenating it together with first doing "a"+"b" and "c"+"d" in parallel and finally do "ab"+"cd". You can do work in parallel and preserve the order of the collection. – ziggystar Jun 10 '11 at 09:48
  • Now I see what you mean by order. I thought adding c+d before adding "b+c" would be a violation of order. Which in turn would have to happen after "a+b" resulting in a sequential process. – Jens Schauder Jun 10 '11 at 11:26
0

I THINK the preservation of 'order' in the sense of the comment to Jean-Philippe Pellets answer is guaranteed due to the way parallel collections are implemented according to a publication of Odersky (http://infoscience.epfl.ch/record/150220/files/pc.pdf) IFF the part that splits your collection is behaving well with respect to order.

i.e. if you have elements a < b < c and a and c end up in one partition it follows that b is in the same partition as well.

I don't remember what exactly was the part responsible for the splitting, but if you find it, you might sufficient information in its documentation or source code in order to answer your question.

Jens Schauder
  • 77,657
  • 34
  • 181
  • 348