1

I am struggling to write proper hashCode function for the Student class given below.

1) What I believe that hashCode should be good enough so that two different object's hashCode should not collide with each other.

Observation : For this implementation, when I debugged and checked for 'internal table object of HashMap' class I found each entry in HashMap is assigned different bucket location.

Question : What is the purpose of having a bucket(list/tree) at each index.

Implementation :

@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + id;
    return result;
}

2) If I allow collision of hashCode:

Observation : For this implementation, when I debugged and checked, found that 'size of internal table of hashMap' keeps increasing and only buckets within the range of hashCode gets used. Rest all bucket indexes shows null.

Question : What is the purpose of increasing internal table size, if the buckets out of hashCode range are always null.

Implementation :

@Override
public int hashCode() {
    return id%20;
}

Need Help for proper hashCode Implementation, so that Above issues can be fixed. Thanks for the help in advance.

============================ Code ===========================

public class HashMapTest {

public static void main(String a[]) {
    HashMap<Student, Integer> set = new HashMap<Student, Integer>();

    for (int i = 0; i < 5000; i++) {
        set.put(new Student(i), i);
    }

    set.put(new Student(5001), 5001);
    System.out.println(set.size());
}
}

class Student {
private int id;

public Student(int id) {
    this.id = id;
}

// Add here hashCode() provided in comments.


@Override
public boolean equals(Object obj) {
    if (this == obj)
        return true;
    if (obj == null)
        return false;
    if (getClass() != obj.getClass())
        return false;
    Student other = (Student) obj;
    if (id != other.id)
        return false;
    return true;
}

}
  • 3
    Assuming students are uniquely identified by `id`, just use the `id` directly as your hash code. No need to mangle it. If you want to learn more about `HashMap` internals, there are many articles on the web describing how hash maps work, e.g. see Wikipedia article about [Hash table](https://en.wikipedia.org/wiki/Hash_table) *(same thing, different name)*. – Andreas Aug 06 '16 at 18:09
  • > "If I allow collision of hashCode". In general, hash code collisions are inevitable regardless of the hash function. The hash code in Java is an int so allows about 2^32 distinct values. It is trivial to generate 2^32 + 1 different Java object (Longs for example) so at least one collision is guaranteed. Having said that, you do want to minimise the probability of a collision as much as possible and the suggestion by @Andreas makes perfect sense. – David Soroko Aug 06 '16 at 20:24

1 Answers1

2

What is the purpose of having a bucket(list/tree) at each index.

HashMap does not require the hashCode to be unique, because this can generally not be accomplished (for instance, there are 2^32 hashcodes, but infinitely many Strings, so it is impossible to use a different hashCode for every String). Instead, it only requires that collisions be rare.

Therefore, the HashMap is implemented such that it still works correctly even if there is a collision (though it may work more slowly in that case). That's why the HashMap uses buckets that can store several elements if necessary.

What is the purpose of increasing internal table size, if the buckets out of hashCode range are always null.

The HashMap resizes the table because doing so splits buckets. Usually, splitting a bucket will cause some elements to go in one, and some elements in the other bucket, improving performance. It does not realize that your hashCode is so bad that all elements will remain in the same bucket, so keeps trying :-)

Need Help for proper hashCode Implementation, so that Above issues can be fixed.

I'd use

@Override
public int hashCode() {
    return id;
}

if id is unique (which its name seems to imply), this is a perfect hash function, and it's even quick to compute :-)

(note that the hashCode may be greater than the table size; the HashMap will take care of that by truncating it as necessary)

meriton
  • 68,356
  • 14
  • 108
  • 175
  • 1
    It should also be mentioned that `.hashCode()` and `.equals(Object)` should always be implemented in pairs. Because after identifying the proper bucket the `.equals(Object)` method is used to find the matching object. -- See also this answer: http://stackoverflow.com/a/27609/4791599 – Markus Mitterauer Aug 06 '16 at 19:08
  • Since OP implemented `equals` correctly, I didn't feel it necessary to mention this, but you are certainly correct that `equal` objects must have equal `hashCodes`. – meriton Aug 06 '16 at 19:41
  • Thanks for the answer ... perfectly cleared all the doubts. – Anant Bhargatiya Aug 07 '16 at 06:47