Spark Fold vs Reduce in performance?

Question

In big data processing job, does function "fold" have lower computation performance compared with function "reduce" ?

For instance, I have the following two functions:

    array1.indices.zip(array1).map(x => x._1 * x._2).reduce(_ + _)

    array1.indices.zip(array1).map(x => x._1 * x._2).fold(0.0) {_ + _}

array1 is a very huge rdd array. which function has higher computation performance giving the same clustering setting.

See [this](http://stackoverflow.com/a/7764875/42188) answer. There would be no difference in terms of performance. — muhuk, Apr 29 '15 at 05:27
possible duplicate of [difference between foldLeft and reduceLeft in Scala](http://stackoverflow.com/questions/7764197/difference-between-foldleft-and-reduceleft-in-scala) — muhuk, Apr 29 '15 at 05:28
This is not duplicate. This question is related to operations on Spark RDDs, not Scala collections. — Wildfire, Apr 29 '15 at 06:49

score 1 · Answer 1 · edited May 23 '17 at 12:16

1

This is indeed the same as the one pointed out by muhuk as the guts of the Spark implementation is merely a call to an iterator

fold from source:

(iter: Iterator[T]) => iter.fold(zeroValue)(cleanOp)

reduce from source:

iter => 
  if (iter.hasNext)Some(iter.reduceLeft(cleanF))
  else None

So, this is primarily just calling into the scala implementations.

edited May 23 '17 at 12:16

Community

answered Apr 29 '15 at 17:19

Justin Pihony

1 Answers1