In spark, are there cases where GroupByKey is preferred to ReduceByKey?

Asked Sep 05 '18 at 22:00

Active Sep 05 '18 at 22:27

Viewed 35 times

It's a well known optimization to replace GroupByKey with ReduceByKey, since the latter reduces shuffling. I was wondering if there are reverse cases in which code with GroupByKey is faster than with ReduceByKey.

edited Sep 05 '18 at 22:27

Joel

1,564
7
12
20

asked Sep 05 '18 at 22:00

alexgbelov

3,032
4
28
42

I think the only case would be when datasize is really small. – Gaurang Shah Sep 06 '18 at 01:29
1

I don't think it can be faster, but depending on the output you want `reduceByKey` will not always work. – Shaido Sep 06 '18 at 01:46

In spark, are there cases where GroupByKey is preferred to ReduceByKey?

0 Answers0