2

In big data processing job, does function "fold" have lower computation performance compared with function "reduce" ?

For instance, I have the following two functions:

    array1.indices.zip(array1).map(x => x._1 * x._2).reduce(_ + _)

    array1.indices.zip(array1).map(x => x._1 * x._2).fold(0.0) {_ + _}

array1 is a very huge rdd array. which function has higher computation performance giving the same clustering setting.

sparklearner
  • 403
  • 3
  • 7
  • See [this](http://stackoverflow.com/a/7764875/42188) answer. There would be no difference in terms of performance. – muhuk Apr 29 '15 at 05:27
  • possible duplicate of [difference between foldLeft and reduceLeft in Scala](http://stackoverflow.com/questions/7764197/difference-between-foldleft-and-reduceleft-in-scala) – muhuk Apr 29 '15 at 05:28
  • 3
    This is not duplicate. This question is related to operations on Spark RDDs, not Scala collections. – Wildfire Apr 29 '15 at 06:49

1 Answers1

1

This is indeed the same as the one pointed out by muhuk as the guts of the Spark implementation is merely a call to an iterator

fold from source:

(iter: Iterator[T]) => iter.fold(zeroValue)(cleanOp)

reduce from source:

iter => 
  if (iter.hasNext)Some(iter.reduceLeft(cleanF))
  else None

So, this is primarily just calling into the scala implementations.

Community
  • 1
  • 1
Justin Pihony
  • 66,056
  • 18
  • 147
  • 180