Understanding treeReduce() in Spark

Question

You can see the implementation here: https://github.com/apache/spark/blob/ffa05c84fe75663fc33f3d954d1cb1e084ab3280/python/pyspark/rdd.py#L804

How does it different from the 'normal' reduce function?
What does it mean depth = 2?

I don't want that the reducer function will pass linearly on the partitions, but reduce each available pairs first, and then will iterate like that until i have only one pair and reduce it to 1, as shown in the picture:

Does treeReduce achieve that?

zero323 · Accepted Answer · 2015-08-29T01:34:30.040

8

Standard reduce is taking a wrapped version of the function and using it to mapPartitions. After that results are collected and reduced locally on a driver. If number of the partitions is large and/or function you use is expensive it places a significant load on a single machine.

The first phase of the treeReduce is pretty much the same as above but after that partial results are merged in parallel and only the final aggregation is performed on the driver.

depth is suggested depth of the tree and since depth of the node in tree is defined as number of edges between the root and the node it should you give you more or less an expected pattern although it looks like a distributed aggregation can be stopped early in some cases.

It is worth to note that what you get with treeReduce is not a binary tree. Number of the partitions is adjusted on each level and most likely more than a two partitions will be merged at once.

Compared to the standard reduce, tree based version performs reduceByKey with each iteration and it means a lot of data shuffling. If number of the partitions is relatively small it will be much cheaper to use plain reduce. If you suspect that the final phase of the reduce is a bottleneck tree* version could be worth trying.

edited Aug 29 '15 at 01:34

answered Aug 29 '15 at 00:49

zero323

322,348
103
959
935

How can i implement something like in my picture? – member555 Aug 29 '15 at 00:58
If your picture represents partitions then as far as I can tell `tree*` methods are the right choice. – zero323 Aug 29 '15 at 01:03
1

yes,each block is a partition. should I pick depth = 2? Because the overall depth is log(num_partitions) – member555 Aug 29 '15 at 01:04
Also, why not always use treeReduce instead of reduce? – member555 Aug 29 '15 at 01:10
Because it is much more expensive. Regarding `depth` parameter 2 should do the trick although keep in mind that it is not a binary tree. I've edited the answer to provide more details – zero323 Aug 29 '15 at 01:33
But how can i get this binary tree? how can I implement this, maybe treeReduce() isn't the best option? – member555 Aug 29 '15 at 01:39
It simply doesn't make sense to use binary tree. Why would you want something like this? – zero323 Aug 29 '15 at 01:42
because in my algorithm, it will be the correct way. If i reduced a partition, i dont want to reduce it again and again. I want to do that at least as possible. The binary tree will achieve that! – member555 Aug 29 '15 at 01:46
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/88239/discussion-between-zero323-and-member555). – zero323 Aug 29 '15 at 01:50
@zero323 Where do you learn all these stuff? Are you contributor to Apache Spark Project? – Alberto Bonsanto Dec 04 '15 at 00:48
@AlbertoBonsanto Not really. I've made some PR but nothing serious. Some reading, some coding and a lot of SO ;) BTW Care to vote http://meta.stackoverflow.com/q/310265/1560062 – zero323 Dec 04 '15 at 17:10
@zero323 How to determine whether the number of partitions is to large? – G_cy Mar 02 '17 at 15:26
@G_cy Monitoring, tunning and common sense. Adding 1000s of integers on a single machine won't be problem, very-expensive-operation-on-very-large-object can throttle the driver even if with dozens of collected tasks. – zero323 Mar 02 '17 at 15:53

Understanding treeReduce() in Spark

1 Answers1

Linked