It's a well known optimization to replace GroupByKey with ReduceByKey, since the latter reduces shuffling. I was wondering if there are reverse cases in which code with GroupByKey is faster than with ReduceByKey.
Asked
Active
Viewed 35 times
0
-
I think the only case would be when datasize is really small. – Gaurang Shah Sep 06 '18 at 01:29
-
1I don't think it can be faster, but depending on the output you want `reduceByKey` will not always work. – Shaido Sep 06 '18 at 01:46