foldLeft or foldRight equivalent in Spark?

Question

In Spark's RDDs and DStreams we have the 'reduce' function for transforming an entire RDD into one element. However the reduce function takes (T,T) => T However if we want to reduce a List in Scala we can use foldLeft or foldRight which takes type (B)( (B,A) => B) This is very useful because you start folding with a type other then what is in your list.

Is there a way in Spark to do something similar? Where I can start with a value that is of different type then the elements in the RDD itself

Do note that I updated the answer below. Since I see you are sort of new, do "accept" the answer if it solves your issue to help others looking for unanswered questions. Thanks! — Daniel Langdon, Aug 04 '15 at 16:42

score 8 · Accepted Answer · answered Aug 04 '15 at 16:34

8

Use aggregate instead of reduce. It allows you also to specify a "zero" value of type B and a function like the one you want: (B,A) => B. Do note that you also need to merge separate aggregations done on separate executors, so a (B, B) => B function is also required.

Alternatively, if you want this aggregation as a side effect, an option is to use an accumulator. In particular, the accumulable type allows for the result type to be of a different type than the accumulating type.

Also, if you even need to do the same with a key-value RDD, use aggregateByKey.

answered Aug 04 '15 at 16:34

Daniel Langdon

5,899
4
28
48

What about for DStreams? there we do not have an 'aggregate' function – zunior Aug 04 '15 at 17:02
Use `transform` to operate under the underlying RDDs. A DStream is basically a sequence of RDDs. – Daniel Langdon Aug 04 '15 at 17:17

foldLeft or foldRight equivalent in Spark?

1 Answers1

Linked