Isn't the term shuffling in MapReduce misleading?

Asked Apr 13 '18 at 16:42

Active Apr 13 '18 at 16:42

Viewed 25 times

I think the term shuffle refers to randomly reordering elements in a sequence [1]. Therefore, the first time I saw shuffling in MapReduce, I thought it's trying to uniformly distribute workload to nodes for load balancing purpose. However, after reading the details, I realized that it's not what I thought it is. It's not random and is more like group by in SQL.

So what's the motivation behind using the term shuffling? Since I'm new to MapReduce, it's most likely that I simply have missed something. I'm all ears.

asked Apr 13 '18 at 16:42

Lingxi

14,579
2
37
93

1

Possible duplicate of [What is the purpose of shuffling and sorting phase in the reducer in Map Reduce Programming?](https://stackoverflow.com/questions/22141631/what-is-the-purpose-of-shuffling-and-sorting-phase-in-the-reducer-in-map-reduce) – OneCricketeer Apr 13 '18 at 22:26
Even the "Shuffle sort" algorithm is not purely random. Both processes deterministically order the data, but the "random" part of the "shuffle" definition may be the order in which the chunks of data *begin to be sorted, or finish sorting* – OneCricketeer Apr 13 '18 at 22:28

Isn't the term shuffling in MapReduce misleading?

0 Answers0