foldByKey,aggregateByKey or combineByKey transformation in spark requires user to provide initialValue. I read some articles about it. In every article, it is said that initialValue should not affect the final result i.e use 0 in addition function, 1 in multiplication etc. Then what is the relevance of this value.
Take example of foldByKey,as per spark documentation
Merge the values for each key using an associative function and a neutral "zero value" which may be added to the result an arbitrary number of times, and must not change the result
Following is the code example of foldByKey using 0 as initial value:
rdd.foldByKey(0, (v1,v2)->v1+v2); where value type of RDD in Integer.
However, reduceByKey will also provide the same result.
Can somebody please shed a light on it that why is the initialValue parameter needed. Does it have any functional/performance benefits?