0

I am building a composite key for a hash map in java and want to determine my own hash code for each of these objects. My question is what is the best methodology of the two below. My composite key has three String attributes and one int attribute.

public int hashCode(){
    return (className + methodName + uniqueNumber).hashCode();
}

public int hashCode(){
    return (className + methodName + desc + uniqueNumber).hashCode();
}

I must have className, methodName, and unique number to guarantee that each key has a unique hash code. I want to go with the method that gives the least chance of a collision. My intuition is that the more attributes that I "add" to my hash map function that less likely a collision will occur. However, I am not entirely certain this is correct.

HXSP1947
  • 1,311
  • 1
  • 16
  • 39
  • I believe there are many similar questions on SO.... – Mitch Wheat Nov 17 '13 at 00:44
  • 2
    If you want to guarantee there are no collisions, `return uniqueNumber;` – Adam Liss Nov 17 '13 at 00:46
  • 1
    uniqueNumber is a consistently incrementing number whose values I don't have direct control over. Using only uniqueNumber will generate a unique hash value, but I will lose my ability to reference specific values in my hashmap – HXSP1947 Nov 17 '13 at 00:50
  • 1
    What is this hashcode for? Usually it is generated from some fields of the object, and should be consistent with the implementation of `equals()` – Mike Tunnicliffe Nov 17 '13 at 00:51
  • I need a composite key for a hashmap since there is no one unique attribute that will distinguish keys. It will be consistent with equals since only className, methodName, and unique number are needed to distinguish keys. What I am worried about is collisions because I have so many keys in my hash map. I want to be sure that how I am generating my hash code is the best possible way to prevent collisions. – HXSP1947 Nov 17 '13 at 00:56
  • possible duplicate of [What is a best practice of writing hash function in java?](http://stackoverflow.com/questions/2738886/what-is-a-best-practice-of-writing-hash-function-in-java) – chrylis -cautiouslyoptimistic- Nov 17 '13 at 00:59
  • 1
    @user2736423: What fields are you using in `equals`? If `uniqueNumber` is among them, use it in `hashCode` and you're done. Otherwise you *must not* use it in `hashCode` or nothing will work anymore. – maaartinus Nov 17 '13 at 01:10

2 Answers2

2

Your question is a bit unclear, as to what fields you need/ are sufficient to uniquely distinguish the key.

Generally, you should combine individual hashes (within a composite key) by multiplying by prime factors.

Assuming the first example:

public int hashCode() {
    int h = className.hashCode() * 23;
    h += methodName.hashCode() * 17;
    h += uniqueNumber;
    return h;
}

OTOH if uniqueNumber is actually unique, you can simplify:

public int hashCode() {return uniqueNumber;}

In your comment you mentioned one thing: "Using only uniqueNumber will generate a unique hash value, but I will lose my ability to reference specific values in my hashmap".

Now this is very important: "Instance Identity" is a very different thing to hash on & lookup, from "Value"! You cannot use the same hashcode & maps for both.

For example, if you need a Key( ClassName, MethodName) -> SomeValue lookup that would be a "value" lookup & would need to be hashed by ClassName & MethodName values so that it could be repeated: ie, so you can construct a key for Map.get() to perform a lookup.

"Instance Identity" actually has builtin support for hashing & maps in Java -- it's called IdentityHashMap.

But for most cases, including & especially Composite Keys which are presumably to be used for a map, the key needs to be able to be re-constructed to later perform a lookup. So the key should have value semantics, and it is dubious whether your uniqueNumber should actually be part of the key.

When you go to do a lookup later, how do you get the correct uniqueNumber to retrieve the data? My feeling is that:

  1. Either there should be a first-class entity there instead, which you could use as the key directly (so no CompositeKey class required any more), or that

  2. You can't repeatably get uniqueNumber, in which case it doesn't work/ isn't required anyway.

To summarize: if uniqueNumber is really required or applicable at all, I would expect it to already be encapsulated in a first-class entity. That's not the case. It looks like you should most probably be using a value-based key, and dropping the uniqueNumber bit (from here at least).

So my recommendation:

public int hashCode() {
    int h = className.hashCode() * 23;
    h += methodName.hashCode() * 17;
    h += desc.hashCode();
    return h;
}

Let me know if this helps.

Thomas W
  • 13,940
  • 4
  • 58
  • 76
  • correct. I didn't give the best example. To simplify my question, does using more attributes of an object to calculate hash code lower the chance of collisions – HXSP1947 Nov 17 '13 at 01:11
  • No. Once you have enough fields to form a "primary key" or "unique key", multiplied those up with different prime factors, you'll have a reasonable low-collision hashcode. For a value-based composite key, just `className` and `methodName` would be sufficient & correct. – Thomas W Nov 17 '13 at 01:21
  • This does help. uniqueNumber is used to calculate some other value and I thought that including it in my function will decrease the likelihood of a collision, but I have no way of recovering this number later. However, className and methodName are not by themselves unique to a key. Part of the problem here is that I have completely butchered the explanation of my problem. I am not using uniqueNumber in my equals method since I can't recover it, but I am using desc since desc with className and methodName produces a unique key. I mentioned className, methodName, and uniqueNumber as a – HXSP1947 Nov 17 '13 at 01:29
  • key since those attributes together are unique (also my code has been changing over the past 20min which is why my formulation of the problem hasn't been clear). With this in mind I think my question has become, does including uniqueNumber in my calculation of the hash code decrease the chance of collisions. – HXSP1947 Nov 17 '13 at 01:29
  • 1
    okay thank you that makes sense. My only comment is to point out your recommendation will not be unique (desc needs to be added as well). This may not make sense, but it is true in the case of multiple constructors in a class. In this case there will be two methods with the same name so desc is required create uniqueness. Thank you for your help :). – HXSP1947 Nov 17 '13 at 01:40
  • That's exactly right, if you're going to include constructors.. I'll edit my answer. Glad I could help! – Thomas W Nov 17 '13 at 01:49
0

A few comments;

(1) It is not necessary for hash codes to be unique. In fact, they usually are NOT guaranteed to be unique. In most circumstances, it would be too computationally expensive to guarantee uniqueness, nor would it be desirable. Collisions are not catastrophic.

(2) Hash codes should reflect the state of the object instance, not the object class. Things like the class name will not enter into it. Unless, of course, that IS the instance data of a class, such as in a class that represents one frame of a stack trace, perhaps.

(3) A good hash code will have a large number of possible values, and these values will be distributed probabilistically such that collisions are UNLIKELY.

(4) In Java, a hash code must be consistent with Object.equals(). See the Javadoc for java.lang.Object for reference.

Arthur Dent
  • 785
  • 3
  • 7
  • className is an attribute of my key (I'm using ASM) so the className is important in differentiating keys. My question is whether or not using more values to calculate hash code lowers the chance of collision or does it not make a significant difference. – HXSP1947 Nov 17 '13 at 01:10
  • Once you've got the major primary-key fields in the hashcode, you'll get good distinctness of hashcodes. For the bytecode domain, I'd definitely use both `className` and `methodName` -- otherwise every method in the same class will be a hash collision. – Thomas W Nov 17 '13 at 01:18
  • Using more values will lower the chance of collision if they're not redundant. I also agree with Thomas, that you probably want to use both className and methodName. Anything else is probably not significantly better. – Arthur Dent Nov 17 '13 at 03:43
  • One more thing, just to be clear: Yes, using primary key fields is a good place to start, but only natural keys are good. SURROGATE KEYS ARE OUT. If it came from a database sequence or an identity column, it's problematic. The uniqueNumber you mentioned above is most likely trouble, for the same reason. The hash should depend on natural data only to avoid problems. – Arthur Dent Nov 17 '13 at 03:49