How hashCode() is generated for a String object in Java?

Question

We know that hashCode() method of an object gives a hash-code based on the memory address of the instance of the object. So when we have two objects of a same class with same data it will still give different hash-code as they are stored in different memory location.

Now, when we create two string objects using new String("Some_Name") we will have two objects that are stored in different address. When we see the hashcode for these two objects we should get different hashcodes as they are stored in different memory location. But we end up getting same hashcode as result.

Employee empObject = new Employee("Some_Name");
Employee empObject1 =new Employee("Some_Name");
String stringObject= new String("Some_Name");
String stringObject1=new String("Some_Name");
                                                  //Output
System.out.println(empObject.hashCode());     //1252169911
System.out.println(empObject1.hashCode());    //2101973421
System.out.println(stringObject.hashCode());  //1418906358
System.out.println(stringObject1.hashCode()); //1418906358

Does this mean that String object has overridden hashCode() method from Object. If so in the overridden method it has to search for other String objects with same data in the heap and put a constant hashCode for all. Help me if my basics understanding itself is wrong.

Note: It is not about String literal in-fact it is about String Object as literals are stored in String Constant Pool and String Object is created outside the pool inside heap as Object.

The hashcode for a string is based on its content, so that two identical strings have the same hashcode. Otherwise it would not be useful. — khelwood, Aug 27 '19 at 07:56
Simply look at the code `public int hashCode() { int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h; }` — Scary Wombat, Aug 27 '19 at 07:56
you can override `public int hashCode(){}` and set it as you desire — Phill Alexakis, Aug 27 '19 at 07:57
Actually in most cases when hash code would matter the `hashCode()` method should be overridden and it should be based on the classes content. The `Object`'s `hashCode()` has very little use tbh. — Amongalen, Aug 27 '19 at 08:02
String has concept of string pool. Where string with same value gets same place in string pool. — Saima Haji, Aug 27 '19 at 08:03
The default place to look for information on Java API methods is the official Javadoc. You can find the Javadoc for the `Object.hashCode` method is [here](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--). Please read it first and then revisit your question. — Erwin Bolwidt, Aug 27 '19 at 08:09
@SaimaHaji That only applies to string-literals, so `"hello"`. Not if you force a new instance which is not cached by using the constructor, like `new String("hello")`, what OP did. — Zabuzard, Aug 27 '19 at 08:17
Any answer would only be valid until the algorithm is changed. You can look at the `source code` and see how it is done now. But since it is implementation specific that does not guarantee how it will be done in the future. Note that if you want to see the hashcode of a string prior to its calculation, use `System.identityHashCode().` — WJS, Aug 27 '19 at 14:50

score 3 · Answer 1 · answered Aug 27 '19 at 08:06

This is what i found with a simple google search:

An object’s hash code allows algorithms and data structures to put objects into compartments, just like letter types in a printer’s type case. The printer puts all A types into the compartment for A, and he looks for an A only in this one compartment. This simple system lets him find types much faster than searching in an unsorted drawer. That’s also the idea of hash-based collections, such as HashMap and HashSet.

The contract is explained in the hashCode method’s JavaDoc. It can be roughly summarized with this statement:

Objects that are equal must have the same hash code within a running process

Unequal objects must have different hash codes – WRONG!
Objects with the same hash code must be equal – WRONG!

The contract allows for unequal objects to share the same hash code, such as the A and µ objects in the sketch above. In math terms, the mapping from objects to hash codes doesn’t have to be injective or even bijective. This is obvious because the number of possible distinct objects is usually bigger than the number of possible hash codes 2^32.

Hope it helps

score 3 · Answer 2 · edited Jun 20 '20 at 09:12

Explanation

Your first paragraph describes the default behavior of hashCode. But usually classes override it and create a content-based solution (same as for equals). This especially applies to the String class.

Default `hashCode`

The default implementation is not done in Java but directly implemented in the JVM, it has a native keyword. You can always get hands on the original hashCode by using System#identityHashCode, see its documentation:

Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode(). The hash code for the null reference is zero.

Note that the default implementation of hashCode is not necessarily based on the memory location. It often is related, but you can by no means rely on that (see How is hashCode() calculated in Java). Here is the documentation of Object#hashCode:

Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.

The general contract of hashCode is:

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

The relevant parts are the second and third requirement. It must behave the same as equals and hash-collisions are okay (but not optimal).

And Object#equals is typically used to create custom content-based comparisons (see documentation).

String `hashCode`

Now let us take a look at the implementation of String#hashCode. As said, the class overrides the method and implements a content-based solution. So the hash for "hello" will always be the same as for "hello". Even if you force new instances using the constructor:

// Will have the same hash
new String("hello").hashCode()
new String("hello").hashCode()

It works exactly as equals, which would output true here as well:

new String("hello").equals(new String("hello")) // true

as required by the contract of the hashCode method (see documentation).

Here is the implementation of the method (JDK 10):

/**
 * Returns a hash code for this string. The hash code for a
 * {@code String} object is computed as
 * <blockquote><pre>
 * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
 * </pre></blockquote>
 * using {@code int} arithmetic, where {@code s[i]} is the
 * <i>i</i>th character of the string, {@code n} is the length of
 * the string, and {@code ^} indicates exponentiation.
 * (The hash value of the empty string is zero.)
 *
 * @return  a hash code value for this object.
 */
public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
            : StringUTF16.hashCode(value);
    }
    return h;
}

Which just forwards to either StringLatin1 or StringUTF16, let us see what they have:

// StringLatin1
public static int hashCode(byte[] value) {
    int h = 0;
    for (byte v : value) {
        h = 31 * h + (v & 0xff);
    }
    return h;
}

// StringUTF16
public static int hashCode(byte[] value) {
    int h = 0;
    int length = value.length >> 1;
    for (int i = 0; i < length; i++) {
        h = 31 * h + getChar(value, i);
    }
    return h;
}

As you see, both of them just do some simple math based on the individual characters in the string. So it is completely content-based and will thus obviously result in the same result for the same characters always.

well explained, nice one – Phill Alexakis Aug 27 '19 at 08:17 — Phill Alexakis, Aug 27 '19 at 08:17

How hashCode() is generated for a String object in Java?

2 Answers2

Explanation

Default `hashCode`

String `hashCode`

Linked

Related

How hashCode() is generated for a String object in Java?

2 Answers2

Explanation

Default hashCode

String hashCode

Linked

Related

Default `hashCode`

String `hashCode`