Using a cached hashCode() value in equals()

Question

For immutable class String; String :: hashCode computation will happen only once in life time of that object. So calling hashCode() after the first time is always just returning private int hash. No CPU will be wasted on computation.

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {    // **h != 0 in second time**
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

As we know the contract between hashcode and equals as

equal objects must produce the same hash code

So i believe below piece of code will improve the performance on string equals(). This may be redundant in hashmap, since hashmap already got the bucket based on hashcode, but this has good performance improvement on BIG List search.

//Does the below will improve equals performance on BIG LIST?
if (this.hashCode() != anObject.hashCode()) {
        return false;
}

Please comment your thoughts the below api.

public boolean equals(Object anObject) {
if (this == anObject) {
    return true;
}

if (this.hashCode() != anObject.hashCode()) {
    return false;
}

if (anObject instanceof String) {
    String anotherString = (String)anObject;
    int n = count;
    if (n == anotherString.count) {
    char v1[] = value;
    char v2[] = anotherString.value;
    int i = offset;
    int j = anotherString.offset;
    while (n-- != 0) {
        if (v1[i++] != v2[j++])
        return false;
    }
    return true;
    }
}
return false;
}

UPDATE

As mentioned by 'dasblinkenlight' there is some cpu cycles required only at the first time of hashcode() API is called.

Since Java is maintaining String Pool; and if application requires large String multiple time comparison other than at hashing; then we can go for Utility method like below.

public boolean StringCompare (String one, String two) {

     if (one == two) {
         return true;
     }

     if (one.hashCode() != two.hashCode()) {
        return false;
     }


    int n = one.count;
    if (n == two.count) {
    char v1[] = one.value;
    char v2[] = two.value;
    int i = one.offset;
    int j = two.offset;
    while (n-- != 0) {
        if (v1[i++] != v2[j++])
        return false;
    }
    return true;

}

Profiling should be your first, second and third stop to see if one piece of code is faster than annother — Richard Tingle, Sep 02 '13 at 18:27
possible duplicate http://stackoverflow.com/questions/5443136/are-two-java-objects-with-same-hashcodes-not-necessarily-equal — Sahil Kapoor, Sep 02 '13 at 18:29
See also http://stackoverflow.com/questions/14262431/why-does-the-equals-method-in-string-not-use-hash?s=53|0.7987 — Raedwald, Apr 15 '15 at 20:34

score 8 · Accepted Answer · answered Sep 02 '13 at 18:32

There's nothing wrong in putting a check like that in your own code to save on an equality check when you know that the comparison is going to fail most of the time, but putting this into the general code has a chance to degrade the overall performance of your system for two reasons:

Computing hash code for the first time takes some CPU cycles; when equals method is called outside of the context of a hash container, the CPU cycles needed to compute the hash code would be wasted
When Strings are used as keys in a hash container, the container establishes the equality of hash codes before proceeding with the equality check, so the comparison of hash codes inside the equals() method becomes redundant.

score 5 · Answer 2 · answered Sep 02 '13 at 18:28

Java is very good at optimization, you do not need to micro-optimize for it. You should write code that is maintainable and readable, and look into performance after you've looked into sources that may be causing issues (and after many performance tests). Writing obscure or hard-to-read code will most likely result in bugs in the future when you or others are unable to discern why the method is written the way it is.

Have you found that your .equals() is the bottleneck in your performance? If not, I would say stick with the code that is more readable and less likely to introduce bugs in the future. The performance difference in your case is most likely negligible, but you can run tests to compare the two implementations.

Dev · Answer 3 · 2013-09-02T18:33:07.547

3

Your optimization would require more work on String objects that are actually equal to each other, as you'd still have to iterate them to ensure they were equal.

The contract requires that equals objects produce the same hashcode. The reverse is not true, in that, objects which produce the same hashcode are not necessarily equal (a hash collision).

edited Sep 02 '13 at 18:33

answered Sep 02 '13 at 18:27

Dev

11,919
3
40
53

It would generate slightly more work with string objects that are equal, but in some cases may save a lot of work on long string objects that differ only near the end. It's too bad that the only way for Java to achieve proper memory-barrier semantics with `string` was to declare the backing-array field `final`, since otherwise it would be possible to optimize repeated comparisons among long `equal` strings by having strings which are found to be equal consolidate their backing stores [note that if old and new backing store contain the same text, it wouldn't matter if some threads... – supercat Apr 15 '15 at 22:15
...saw the swap and others didn't. – supercat Apr 15 '15 at 22:15

Using a cached hashCode() value in equals()

UPDATE

3 Answers3