0

I am trying to understand partitioning in MapReduce and I came to know that Hadoop has a default partitioner known as HashPartitioner, and partitioner helps in deciding to which reducer a given key would go to.

Conceptually, it works like this:

hashcode(key) % NumberOfReducers, where `key` is the key in <key,value> pair.

My question is:

How does HashPartitioner calculate the hash-code for the key? Does is simply call the hashCode() of the key or does this HashPartitioner use some other logic to calculate the hash-code of the key?

Can anyone help me understand this?

CuriousMind
  • 8,301
  • 22
  • 65
  • 134
  • Possible duplicate of [Hadoop partitioner](https://stackoverflow.com/questions/27595195/hadoop-partitioner) – Ben Watson Mar 09 '18 at 09:56

1 Answers1

1

The default partitioner simply use hashcode() method of key and calculate the partition. This give you an opportunity to implement your hascode() to tweak the way the keys are going to be partitioned.

From the javadoc:

public int getPartition(K key,
               V value,
               int numReduceTasks)
Use Object.hashCode() to partition.

For the actual code, it simply returns (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks:

 public int More ...getPartition(K key, V value,
                          int numReduceTasks) {
    return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
 }

EDIT: Adding detail on custom partitioner

You can add a different logic for partitioner, which even may not use hashcode() at all.

As custom partitioner can be written by extending Partitioner

public class CustomPartitioner extends Partitioner<Text, Text>

One such example, which works on the properties of the custom key object:

public static class CustomPartitioner extends Partitioner<Text, Text>{
@Override
public int getPartition(Text key, Text value, int numReduceTasks){
    String emp_dept = key.getDepartment();
    if(numReduceTasks == 0){
        return 0;
    }

    if(key.equals(new Text(“IT”))){
        return 0;
    }else if(key.equals(new Text(“Admin”))){
        return 1 % numReduceTasks;
    }else{
        return 2 % numReduceTasks;
    }
}
Gyanendra Dwivedi
  • 5,511
  • 2
  • 27
  • 53
  • So, if we use a Custom key, then the default partitioner would work on that as well, it will simply calculate the hash-code using the default hashCode() of the object (assuming we don't override the hashCode()) – CuriousMind Mar 08 '18 at 19:09
  • 1
    Yes, please make sure that the custom key implements `Writable` interface. – Gyanendra Dwivedi Mar 08 '18 at 19:10
  • Ok , so the default partitioner can work on normal / Primitive Keys as well as on Custom Keys. This I am now clear. The doubt which I have, is we say we can write a custom partitioner. What we do in custom partitioner? Do we still use the hashCode() or it just depends on the needs? Can you explain a bit on this? – CuriousMind Mar 08 '18 at 19:13
  • 1
    I don't think that we have a `primitive` keys. Mapper and reducer in java API always need `Writable` (an interface) type; So `int` type is basically `IntWritable` and so on (but they all are object, not primitive). for custom partitioner, I have updated the post. – Gyanendra Dwivedi Mar 08 '18 at 19:25
  • When I referred to Primitive , it was Hadoop's "primitives" like IntWritable, Text etc. However, thanks for adding the details in your answer. I want to check, for Keys we should implement WritableComparable interface, and not Writable; I hope this is correct. – CuriousMind Mar 08 '18 at 19:28
  • 1
    `WritableComparable` interface is just a subinterface of the `Writable` and `java.lang.Comparable` interfaces. So, its again a `Writable` :) . Now depending on what exactly you want, you may implement just `Writable` or `WritableComparable`. You may want to check out the scenario for either. – Gyanendra Dwivedi Mar 08 '18 at 19:31
  • 1
    Thanks for your help in clarifying the doubts , catch you soon in another question :) – CuriousMind Mar 08 '18 at 19:46
  • 1
    Happy to help :) – Gyanendra Dwivedi Mar 08 '18 at 20:08