10

I am going through the Introduction of Algorithms by Cormen et al video and it discusses several hashing functions . I want to know what hashing function does Java use by default?Does the hashing function actually differ for different types of objects that are used as keys? Is there an api in the Collections framework which let us write our own hashing algorithm ?

Geek
  • 26,489
  • 43
  • 149
  • 227
  • 2
    Others have given perfectly good answers to this question. I'll just add that by making the hash function an attribute of the object type rather than the collection, it allows you to re-use those same types across many collections, and for collections to handle any arbitrary types without having to know how to hash every possible type they might encounter. – Alex Aug 16 '12 at 16:35
  • Also: always override the default hashcode when you make a new object type if it's at all possible it might be used in a collection. The default hashcode is based on the identity of the object, so different instances will give different hashcodes even if all their fields are identical. – Alex Aug 16 '12 at 16:42
  • Possible duplicate of [Why does Java's hashCode() in String use 31 as a multiplier?](https://stackoverflow.com/questions/299304/why-does-javas-hashcode-in-string-use-31-as-a-multiplier) I recognize that this is not strictly speaking a duplicate; however I believe that a *good* answer to this question would include the answer to the earlier question. – M.J. Rayburn Sep 02 '17 at 00:06

5 Answers5

11

Each object in java has a public int hashCode() method that returns a hash. Each object is free to implement it in its own way by overriding that method. If the method is not overriden, the default Object#hashCode method is used.

You can have look at the source code of various objects to see how it is implemented in the JDK. This is String's hashCode for example (line 1494).

Some collections can add an additional layer of hashing on top of the objects' hashCode methods. For example, HashMap does that to improve performance when an object's hashCode is not well distributed.

Community
  • 1
  • 1
assylias
  • 321,522
  • 82
  • 660
  • 783
  • Interesting, didn't know that about HashMap. But I'm guessing that behavior is different from the user-supplied hashing that OP is asking for? – Alex Aug 16 '12 at 16:30
  • 1
    @Alex It's related. HashMap basically hashes the hash. See [HashMap.hash](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java#HashMap.hash%28int%29). – yshavit Aug 16 '12 at 16:31
2

You can always override it in any of your classes... Like

 @override
 public int hashCode()
 { 
 //new implementation 
 }

http://mindprod.com/jgloss/hashcode.html

The default hashCode() method uses the 32-bit internal JVM (Java Virtual Machine) address of the Object as its hashCode.

However, if the Object is moved in memory during garbage collection, the hashCode stays constant. This default hashCode is not very useful, since to look up an Object in a HashMap, you need the exact same key Object by which the key/value pair was originally filed.

Normally, when you go to look up, you don’t have the original key Object itself, just some data for a key. So, unless your key is a String, nearly always you will need to implement a hashCode and equals method on your key class.

Object.hashCode() is a native method.

Shark
  • 6,513
  • 3
  • 28
  • 50
1

It depends on the kind of object that you use. For any object that you implement in your own classes, you can always override the default hashCode() method.

Note, you should always obey the contract between hashCode() and equals() as mentioned in the hashCode() javadoc:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

For more information read this entry.

Community
  • 1
  • 1
matsev
  • 32,104
  • 16
  • 121
  • 156
  • In fact if you intend to store your objects in HashMap or similar you **should** (nearly must) override **both** `hashCode()` and `equals()` – Germann Arlington Aug 16 '12 at 16:31
0

Every type in Java has hashCode() method defined, as it's in the Object. hashCode() returns a int. And in HashMap implementation, it hashes the result again and take only the lower bits to make it in the range of 0 to size-1. Note in Sun JDK, size is always 2x, x being some integer.

Java library is open source and you probably have a copy on your dev machine.

In Sun JDK 6, the second hash I mentioned above is

/**
 * Applies a supplemental hash function to a given hashCode, which
 * defends against poor quality hash functions.  This is critical
 * because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits. Note: Null keys always map to hash 0, thus index 0.
 */
static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

You can find the first hash by looking at the hashCode() function on the class you are interested in.

Haozhun
  • 6,331
  • 3
  • 29
  • 50
0

All classes in Java inherit from java.lang.Object, and by doing so, they inherit the method hashCode() that returns an int. The default method returns some (more or less) unique value created by the VM (think of it as the memory address of the object, even though that's not entirely correct). When you implement your own classes you can override that method to do whatever you want. You should however pay attention that your hashCode and equals methods are consistent, and you should be aware that in general hash codes are not unique, so whatever you do, expect collisions between hash codes of different objects.

The Collections framework usually uses the hashhCode() method to hash things for Hashtables etc. It is conceivable that other datastructures in other libraries use explicit hash functions, but that does not happen in the Collections framework.

Jochen
  • 2,277
  • 15
  • 22
  • *"The default method returns some unique value"*: default hashcodes are not necessarily unique. – assylias Aug 16 '12 at 16:34
  • True, even though `Object.hashCode` on any implementation I know does. I hope my edit clarifies. – Jochen Aug 16 '12 at 16:37
  • hashcode is an int, so on a 64bit machine, there are (potentially) more memory addresses than an int can contain for example. – assylias Aug 16 '12 at 16:38