0

Does the data combine in each partition? As we all know ,if use reduceByKey ,data is combined at each partition , only one output for one key at each partition to send over network. reduceByKey required combining all your values into another value with the exact same type. I mean, is it like reducebykey?

Shaokai Li
  • 69
  • 1
  • 5
  • 2
    Possible duplicate of [DataFrame / Dataset groupBy behaviour/optimization](https://stackoverflow.com/questions/32902982/dataframe-dataset-groupby-behaviour-optimization) – 10465355 Oct 31 '18 at 13:05
  • yes its optimized, but limited to the built-in aggregation functions (unless you implement an UDAF). – Raphael Roth Oct 31 '18 at 15:51

0 Answers0