12

Let's say I have a somewhat large (several millions of items, or so) list of strings. Is it a good idea to run something like this:

val updatedList = myList.par.map(someAction).toList

Or would it be a better idea to group the list before running ...par.map(, like this:

val numberOfCores = Runtime.getRuntime.availableProcessors
val updatedList = 
  myList.grouped(numberOfCores).toList.par.map(_.map(someAction)).toList.flatten

UPDATE: Given that someAction is quite expensive (comparing to grouped, toList, etc.)

Vilius Normantas
  • 3,708
  • 6
  • 25
  • 38

2 Answers2

14

Run par.map directly, as it already takes the number of cores into account. However, do not keep a List, as that requires a full copy to make into a parallel collection. Instead, use Vector.

Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
8

As suggested, avoid using lists and par, since that entails copying the list into a collection that can be easily traversed in parallel. See the Parallel Collections Overview for an explanation.

As described in the section on concrete parallel collection classes, a ParVector may be less efficient for the map operation than a ParArray, so if you're really concerned about performance, it may make sense to use a parallel array.

But, if someAction is expensive enough, then its computational cost will hide the sequential bottlenecks in toList and par.

axel22
  • 32,045
  • 9
  • 125
  • 137