Using Hadoop, are my reducers guaranteed to get all the records with the same key?

Question

I'm running a Hadoop job using Hive actually that is supposed to uniq lines in many text files. In the reduce step, it chooses the most recently timestamped record for each key.

Does Hadoop guarantee that every record with the same key, output by the map step, will go to a single reducer, even if many reducers are running across a cluster?

I worry that the mapper output might be split after the shuffle happens in the middle of a set of records with the same key.

score 14 · Accepted Answer · answered Apr 13 '10 at 22:53

14

All values for a key are sent to the same reducer. See this Yahoo! tutorial for more discussion.

This behavior is determined by the partitioner, and might not be true if you use a partitioner other than the default.

answered Apr 13 '10 at 22:53

Karl Anderson

1,798
11
18

actually i am not sure of this. See http://stackoverflow.com/questions/26693034/hadoop-strange-behaviour-reduce-function-doesnt-get-all-values-for-a-key . I didn't modify the partitioner in my program. – Madrugada Nov 01 '14 at 23:50

Bkkbrad · Answer 2 · 2010-04-14T22:53:42.530

5

Actually, no! You could create a Partitioner that sent the same key to a different reducer each time the getPartition is called. It's just not generally a good idea for most applications.

edited Apr 14 '10 at 22:53

answered Apr 14 '10 at 12:13

Bkkbrad

3,087
24
30

Binary Nerd · Answer 3 · 2010-04-13T23:01:31.230

Yes, Hadoop does guarantee that all keys that are the same will go to the same Reducer. This is achieved using a Partition function which buckets the keys using a hash function.

For more information on the Partitioning process take a look at this: Partitioning Data

It specifically talks about how different mappers that process the same key ensure that all keys of a given value end up in the same partition, and thus are processed by the same reducer.

Using Hadoop, are my reducers guaranteed to get all the records with the same key?

3 Answers3