2

Let Abstract be an abstract class, and A1,A2,...,An concrete classes that inherit from Abstact. Each one of Ai has a list of Abstract and a pre-defined, known at compile time, set of primitive types, let's assume we have a hush function for them, and there are no 'loops' in the structure of each concrete element.

Two elements e1 and e2 are identical if they have the same values for the predefined primitives, and if for each Abstract in e1, there exists an Abstract in e2 such that e1 and e2 are identical. (in other words, order is not important).

I am looking for a good hash heuristic for this kind of problem. It shouldn't (and as far as I know, can't be) a perfect hash function, but it should be good and easy to compute at run time.

I'll be glad if someone can give me some guidelines how to implement such a function, or direct me to an article that addresses this problem.

PS I am writing in Java, and I assume (correct me if I am wrong) the built in hash() won't be good enough for this problem.
EDIT :
the lists and primitives are fixed after construction, but are unknown at compile time.

amit
  • 175,853
  • 27
  • 231
  • 333
  • A list where order isn't important? That's a set. – Pops Mar 28 '11 at 14:09
  • @Torgamus: It is 'important' for performance issues - I am calling a method for each element in the list, and the order of the calls might improve performance. For correctess - you are right, this is a set. – amit Mar 28 '11 at 14:11
  • related http://stackoverflow.com/questions/3869252/what-is-the-preferred-way-of-implementing-hashcode – andersoj Mar 28 '11 at 14:21

2 Answers2

2

If these lists can change after they are constructed, it would be a bad idea to base the hash function on them. Imagine if you stuck your object into a HashMap, and then changed part of it. You would no longer be able to locate it in the HashMap because its hashCode would be different.

You should only base the result of hashCode on immutable values. If you don't have any immutable values in your object, your best bet would probably be to simply use the basic Object.hashCode(), although you'll lose out on equality testing.

If these objects are immutable, however, then I recommend choosing some kind of sort order for your elements. Then you can compute a hash code across your lists, knowing that it will be the same even if the lists are in different orders, because you are sorting the values before hashing.

Jonathan
  • 13,354
  • 4
  • 36
  • 32
  • The lists are fixed after construction, but are unknown at compile time.about your last statement, If I sort the lists, will Object.hashCode() be enough? will two identical (as described before) elements get the same hash code? – amit Mar 28 '11 at 14:07
  • Agreed. I've seen poor code that (using Apache HashCodeBuilder or something) where the hash is recalculated every time using fields that can change (in a Javabean no less). Bad, bad, bad. – Chris Dennett Mar 28 '11 at 14:08
  • No, `Object.hashCode()` will return different values for objects which would otherwise appear the same. The default implementation returns the address of the object. Also, if you will be looping through lists like this, you may want to cache the hash code so you only compute it once (it sounds like it could be expensive). – Jonathan Mar 28 '11 at 14:26
1

Use Google Guava's utilities... Objects.hashCode() is great. Also, the source is available, and they have solved the problem you state, so you can take a look at their solution.

andersoj
  • 22,406
  • 7
  • 62
  • 73
  • thanks, I'll take a look at it. Did they publish an article I can take a look at about this? – amit Mar 28 '11 at 14:21
  • 1
    Don't know if there's an article specifically, but project site (code, docs) is here http://code.google.com/p/guava-libraries/ and Guava uses StackOverflow for Q+A so you can find quite a bit by searching using the guava tag. – andersoj Mar 28 '11 at 14:22