15

When we have equals(), compareTo() methods why there is a hashcode() method in Java?

And in case if we use HashTable we have to override hashcode() method, Is there any special reason except fast accessing of random keys? If we override the hashcode() method what would be the probable implementation ?

How Java ensures object uniqueness in memory?


Hashcodes are typically used to enhance the performance of large collections of data.

In hashing we calculate hash code. it's an additional task. When we do additional operation for each object that is added to a collection. How the performance gets improved?

nbro
  • 15,395
  • 32
  • 113
  • 196
MaheshVarma
  • 2,081
  • 7
  • 35
  • 58

2 Answers2

14

You must always override equals and hashCode in tandem, to satisfy their interdependent contracts. A class which implements them contradictorily is simply broken and unacceptable under even the minimum software engineering standards.

As to why one would ever use the hashtable data structure: because it is the fastest option around for random-access key-value storage.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • What code we write inside hashcode() method? – MaheshVarma Aug 06 '13 at 12:01
  • 2
    it's the fastest option only if hashcodes of objects are sensible. Having hashcode return 0 for all objects fullfills the contract, but would make hash-based collections perform horribly. Just overriding hashcode with something that blindly obeys the contract will not help. – eis Aug 06 '13 at 12:14
  • 1
    @MaheshVarma There is a standard idiom to implement a good `hashCode`, and it is widely awailable on the web. It is also a feature of most modern IDE's to generate the whole `equals` and `hashCode` methods for you. – Marko Topolnik Aug 06 '13 at 12:20
0

Using the compareTo method you establish a "total order" for your objects. A total order is a fairly weak property: it can only tell you if one object is "less than" another, but it gives you no idea of "how far apart" two objects are.

For example, if you have a N objects in a key-value data structure and you want to find the value for a given key. Having only a total order you need at least O(log N) comparisons to find a matching key.

A hash code is a stronger property because it can tell you if two objects are somewhat similar or completely different. Thanks to this a hash table can find a value for a key with O(1) operations.

Joni
  • 108,737
  • 14
  • 143
  • 193
  • Can you elaborate on how hashcode can be used for distance? AFAIK you could legally implement hashCode as something akin to MD5 and it would contain little information on how "similar" two objects are... – c-x-berger Jul 21 '20 at 20:23
  • @c-x-berger using just the hash, you can define a [metric in the mathematical sense](https://en.wikipedia.org/wiki/Metric_(mathematics)) ("distance" colloquially) by taking the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) of the hash. It does not matter if the hashed objects are actually very different in some other sense. – Joni Jul 21 '20 at 23:01
  • I still don't follow - you can define a perfectly "legal" `hashCode` that varies _wildly_ with even a single byte of difference between two objects. Using it as a distance/similarity metric seems like (at best) not what the method is designed for. (The default impl is memory location - which has _zero_ information about object "similarity"!) – c-x-berger Jul 22 '20 at 14:50
  • The Hamming distance on the hash code gives you a valid measure of "distance" which follows all the laws required for a metric (see wikipedia links above). It's not what reasonable people would call a distance, but it's still a valid measure of distance. It does not matter if a difference of a single bit in the source objects sends them half a universe apart. The only requirement for hash code is that "equal" objects should have the same hash code, so that the "distance" between "equal" objects is zero. – Joni Jul 22 '20 at 15:13
  • The OP's question was "why does hashCode exist when we already have equals and compareTo". My answer is "because hash code gives you a lot more information about approximate object equality than compareTo" (objects with the same hash code are nearby each other in the sense of the hash code distance). You can post a new question on this if you want a longer answer. – Joni Jul 22 '20 at 15:37