0

I have a hashcode implementation for a class and the hashcode implementation is consistent with what eclipse generates and also the most commonly accepted practice as discussed here

Here is my hashcode implementation(All the Ids used in this method constitute the key for the object):

public int hashCode() {
    final int prime = 31;
    int hashCode = 1;
    if(uId != null){
        hashCode = prime * hashCode + uId.hashCode();
    }
    if(rId != null){
        hashCode = prime * hashCode + rId.hashCode();
    }
    if(bId != null){
        hashCode = prime * hashCode + bId.hashCode();
    }
    if(reId != null){
        hashCode = prime * hashCode + reId.hashCode();
    }
    if(cId != null){
        hashCode = prime * hashCode + cId.hashCode();
    }
    return hashCode;
}

I ran into a scenario where I was testing with a very large data set and my collection did not have the expected number of objects of this class. On looking closely the below two data sets resulted in the same hashcode : 50268236873 and hence a record was being replaced by the last one that was added to the collection as their hashcodes were same.

  Existing record :
  Record@2c0781cd[uId=54046,rId=10967,bId=177,reId=1728,cId=50194] 

  Record being inserted into the collection :
  Record@20dad050[uId=53806,rId=18389,bId=177,reId=19026,cId=50194]

Both of these had the hashCode value = 50268236873 

So, the questions :

1] This is clear case where hash codes of two different objects have the same value. So how to ensure this does not happen with any data set ? Should the prime be larger ?

2] If we look closely the hashCode variable in the implementation is of int data type whose largest value is 2^31 - 1 = 2147483647 which is greater that the hashcode that is computed for the above data set = 50268236873, so there is a overflow. Is there any consequence to use long as the type of hashCode value ?

thanks
Nohsib

Edit :

I am using HashSet and after reading through the answers posted, I looked up the equals implementation,as below, and I think because in the equals I check to see if the hashCodes of the two objects are same and use that to determine if they are same objects is leading to this issue.

Can any of you guys confirm this ?

@Override
    public boolean equals(Object paramObject) {
        boolean equals = false;
        if (paramObject != null) {
            ACRecord other = (ACRecord) paramObject;
            if ((this.hashCode() == other.hashCode()) // I think this is where I am going wrong
                    || (this.uId.equals(other.getUId())
                            && this.rId.equals(other.getRId())
                            && this.reId.equals(other.getReId())
                            && this.bId.equals(other.getBId()) 
                            && this.cId.equals(other.getCId))) {
                equals = true;
            }
        }
        return equals;
    }

Solution : My equals method implementation was wrong since I used hashCode to determine if two objects were equal.Correcting the equals method implementation resolved my issue were hashset was replacing an exisintg record.

Community
  • 1
  • 1
Nohsib
  • 3,614
  • 14
  • 51
  • 63
  • 1
    what was the collection being used? Only the equals method are used in collections to detect duplicates, hashes are only used to speed up the process. – Zielu Mar 20 '15 at 00:42
  • 1
    Aside: there's an (arguable) logic flaw in your hash code. You might need to consider the case where each ID is null, if you want to preserve the relative position of each ID in the hash. So each if-clause might be better done as `hashCode = prime * hashCode + (id == null ? 0 : id.hashCode());`. As a bonus, this makes the method easier to read. – Paul Hicks Mar 20 '15 at 00:45
  • @Zielu: I am using HashSet. – Nohsib Mar 20 '15 at 01:30
  • @Paul Hicks : thanks, I shall take your advise. – Nohsib Mar 20 '15 at 01:30
  • 1
    Then the problem is in your equal. You cannot use the hashCode to check for equality. – Zielu Mar 20 '15 at 01:36
  • You can do the opposite `if (this.hashCode() != other.hashCode()) { return false; }` but you can't stop there. You must check for equality without using hashCode. – Paul Hicks Mar 20 '15 at 02:57
  • Yes, you've correctly identified that that line in your `equals` method is wrong and you should just get rid of it. – Louis Wasserman Mar 20 '15 at 05:48

5 Answers5

9

Typically, hash codes don't guarantee uniqueness. HashMap implementations typically deal with collisions by storing a list behind the scenes, but they include a check that ensures that you don't get everything in the list as a match, just the ones that really match.

In other words, if you do map.get("foo") and there are collisions, the hash map will check each result (unhashed) to see if it really matches "foo". Then it returns only exact matches.

Note also that while the contract for hashcodes states that any two objects that respond true to equals() should have the same hashcode, the opposite is not necessarily true.

L. Blanc
  • 2,150
  • 2
  • 21
  • 31
  • I do not agree with your answer because, with a language like java which has been around for so long and used so extensively, if collections wont work as it should because hashCodes of 2 objects are same due to a specific implementation then this language would not be used and accepted as it is in the IT industry. There should be a better rationale then your answer. – Nohsib Mar 20 '15 at 00:37
  • 4
    @Nohsib - you're misunderstanding the contract of hashcode - it is there to provide a hash, not a unique ID. If you're using the Java collections which use hashcode, then so long as you've correctly implemented equals too, they can work out how to store things without duplicates and collisions. The real issue with your code is not the implementation of hashcode, it's the way you're building your collection - would you care to share it? – Ashley Frieze Mar 20 '15 at 00:41
  • 1
    @Nohsib. I should have been more clear. A hashmap will store colliding answers as a list, but it includes a check that ensures that you don't get everything in the list as a match, just the ones that really match. In other words, if you do map.get("foo") and there are collisions, the hash map will check each result (unhashed) to see if it really matches "foo". Then it returns only exact matches. – L. Blanc Mar 20 '15 at 00:45
  • @Nohsib I do not agree with your comment because... normally you'd override `hashCode()` and `equals()`. together to ensure no collisions. – Neilos Mar 20 '15 at 01:09
  • @Neilios As the answer from "that other guy"states, it is guaranteed that two objects that are "equal" will have the same hash. The opposite is not true. – L. Blanc Mar 20 '15 at 01:17
  • Thankyou user58273 and Ashley Frieze for those clarifications. Edited the question with my equals implementation (i think thats where I am going wrong) and I am using HashSet and simply calling the add api on it after creating the object that I am passing to the api to add. – Nohsib Mar 20 '15 at 01:32
3

Here is the contract for hashCode from the Java 8 docs (summarized):

  1. Calling the method twice on the same object must result in the same value (per JVM instance).

  2. If two objects a and b are equal according to a.equals(b), the hashCodes must be the same.

Here is the minimal definition that satisfies the above:

public int hashCode() {
  return 0;
}

All java.util.* Collections like HashTable and HashMap conform to this contract, and will never drop items due to duplicate hashCodes, even when excessively duplicated like in the example above. It will be slow, but it will be correct.

Instead, typical reasons for unexpected results when adding or retrieving from a hash based collection include:

  • Reusing/modifying objects in such a way that their hash codes change at runtime (violation of #1)
  • Not overriding .equals(Object)
  • Using a buggy collection (outside of java.*) that assumes more about hashCode than what the contract specifies.
that other guy
  • 116,971
  • 11
  • 170
  • 194
0

There is no requirement for hashCode to be unique, only that if two objects are equals they hashesh have to be equal as well.

The hashes collisions are to be expected and inevitable cause as you noticed there can be only 2*maxint possible values so if the possible object space exceeds this number there must be a collision.

You cannot change hashCode to long as it is already defined to be int and such will be used.

The collections like hashMap or HashSet are aware of the possible collisions and they are not affected by them. Your custom code must be collision proof as well.

Zielu
  • 8,312
  • 4
  • 28
  • 41
0

Hashcodes typically map a large range of values to a smaller range of values. This means even the most perfect hash algorithm for your data will create collisions when reaching n + 1 values where n is the number of possible hash values (which would be 2^32 when using int as hash code)

Your implementation needs to handle such collisions by doing a complete check of all your members of your object to verify they are actually equal.

Hashing usually reduces full checks drastically by reducing the amount of necessary checks to verify the result, because you only need to check against values that have the same hash code until you find the one that fully matches your data, or if none matches your data is not present in the map.

See this answer for a brief description of a hash map implementation.

Community
  • 1
  • 1
MarvinPohl
  • 156
  • 3
0

Hashes are never meant to be completely unique. However, there are some hashing algorithms that are better at avoid collisions. As you already have in your code, it typically is best to use prime numbers to help with collisions.

Derek_M
  • 1,018
  • 10
  • 22