1

I have this class:

public class Person {
    private String id;

    @Override
    public boolean equals(Object obj) {
        return obj instanceof Person &&
            this.id.equals(((Person) obj).id);
    }

    @Override
    public int hashCode() {
        ...
    }
}

The field id consists solely of digits. My question is, should the hashCode method look like this:

@Override
public int hashCode() {
    return id.hashCode();
}

or this:

@Override
public int hashCode() {
    return Integer.parseInt(id);
}

Provided that id <= Integer.MAX_VALUE.

Martin M J
  • 830
  • 1
  • 5
  • 12
  • 2
    First, why do you have a String if it only consists of digits? Just make it an `int` and return this value in the hashcode method. You can't have collisions as you have a single `int` value. – Alexis C. May 16 '15 at 14:35
  • @AlexisC. Do you mean that there will be no collisions because `hashCode` is only used when you store an object in a `hashed` collection and all `hashed` collections operate on uniqueness, thus ensuring that two equal integers will never be stored in the collection? – Chetan Kinger May 16 '15 at 14:41
  • Don't think about such things and read about: http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/builder/HashCodeBuilder.html – Tomek May 16 '15 at 14:43
  • 1
    @AlexisC. ZIP codes are also only digits, but are still often represented as a String. I'm not performing any mathematical operations on `id`. – Martin M J May 16 '15 at 14:45
  • @SeanMcCauliff I have. Do you think it should be different? – Martin M J May 16 '15 at 14:59
  • @ChetanKinger A collision appears when two objects are not equals but have the same hashcode. Here the equality is defined only with `id` and, provided that this value is `<= Integer.MAX_VALUE.`, you should not have collisions if you return the int value in the hashCode method. – Alexis C. May 16 '15 at 15:13
  • @MartinMJ Depends on the country. In my country (The Netherlands) postal codes are definitely not only digits. – Mark Rotteveel May 16 '15 at 15:14
  • Personally I can't remember thinking about this too much in the past, I tend just let my IDE generate your `equals` & `hashCode` and then get on with other problems (that always exist) – Gareth Davis May 16 '15 at 15:27
  • *if you have to ask this question you should just probably leave it alone* –  May 16 '15 at 15:36

2 Answers2

2

You should leave it as the hashCode of the string.

This is for a couple of reasons:

  1. String.hashCode() will distribute the hash over more of the integer space better than just returning Integer.parseInt
  2. String will cache the hash code and prevent therefore you won't pay the cost of Integer.parseInt on each call to hashCode()

The key assumption I'm making here is that range of values for your id's is something like '1' - '9999999' i.e. doesn't include negative integers and probably doesn't include all integers otherwise you will not be able to add a new id.

Gareth Davis
  • 27,701
  • 12
  • 73
  • 106
  • Your point 2 can be easily solved by a private `int` field that stores the hash value of the id, if the OP really needs a String. Also since the id is <= as `Integer.MAX_VALUE`, your point 1 is not true in this case and the OP can simply return this cached int value in the hashCode method. A simple test, and you see that "1012" and "1466960" have the same hashCode (probably due to overflow). Returning the int value representing `id` is the best choice here, IMO. – Alexis C. May 16 '15 at 15:07
  • 1
    Why make it harder?... String has a pretty reasonable hash code (not great), the caching is free. But I think my main point is that 'id' is likely to be a small int (looks like it could be the primary key of a db table) hence using a hash to doesn't just return it blindly might improve it's distribution over the full range of int. – Gareth Davis May 16 '15 at 15:24
  • In fact the only thing that makes this harder is that the OP is using the wrong datatype. With an `int`, you get an easy free bijective hash function, and the caching is also free... – Alexis C. May 16 '15 at 15:34
  • there may have his reasons.. such as the value is actually 0001 etc or cos the original author was alien. Although if it's a datatype issue the value should maybe something like `PersonId` or some other strong type rather than an int – Gareth Davis May 16 '15 at 15:39
1

Delegate the hash code for all objects, including in this case. It's not worth making your code complicated to optimize for some hypothetical performance problem, and in this particular case, it would be counterproductive, since the standard implementation caches the hash code.

chrylis -cautiouslyoptimistic-
  • 75,269
  • 21
  • 115
  • 152