Does the data combine in each partition? As we all know ,if use reduceByKey ,data is combined at each partition , only one output for one key at each partition to send over network. reduceByKey required combining all your values into another value with the exact same type. I mean, is it like reducebykey?
Asked
Active
Viewed 467 times
0
-
2Possible duplicate of [DataFrame / Dataset groupBy behaviour/optimization](https://stackoverflow.com/questions/32902982/dataframe-dataset-groupby-behaviour-optimization) – 10465355 Oct 31 '18 at 13:05
-
yes its optimized, but limited to the built-in aggregation functions (unless you implement an UDAF). – Raphael Roth Oct 31 '18 at 15:51