5

We are told that we should implement hashCode() for our classes but most people like me have no real idea of how to do this or what happens if we get it "wrong". For example I have a need for a hash function for indexing nodes in a tree (Finding the most frequent subtrees in a collection of (parse) trees). In this case I need to recursively generate hashcodes based on ordered child nodes, e.g.

hashCode = function(child1.hashCode, child2.hashCode, ...)

In a recent discussion of hashCodes answers included a hash for strings (based on a long prime and 31) and also bitshifting. The String hash is:

// adapted from String.hashCode()
public static long hash(String string) {
  long h = 1125899906842597L; // prime
  int len = string.length();

  for (int i = 0; i < len; i++) {
    h = 31*h + string.charAt(i);
  }
  return h;
}

I'm not interested in security and don't mind collisions. Is there a "universal function" for combining hashcodes of ordered objects that will do more good than harm (and more good than not calling it at all)?

Also is there a site where we can look up common cases? strings, lists, etc.)

I didn't specify a language as I was hoping there were universal approaches. But if it is seriously language-specific then please indicate the language and why it's not universal.

UPDATE Two suggestions are to use the IDE's hashCode generator. That seems an excellent default; Here's Netbeans:

public int hashCode() {
    int hash = 5;
// objects
    hash = 97 * hash + (this.rootElement != null ? this.rootElement.hashCode() : 0);
    hash = 97 * hash + (this.tableElement != null ? this.tableElement.hashCode() : 0);
// a string
    hash = 97 * hash + (this.tag != null ? this.tag.hashCode() : 0);
    return hash;
}
Community
  • 1
  • 1
peter.murray.rust
  • 37,407
  • 44
  • 153
  • 217

7 Answers7

4

There is an excellent hashCode() in Joshua Bloch's Effective Java. Sample Chapter 3, "Methods Common to All Objects" was free (well, it used to be back when there was a page on Sun's old site for it. If you search you might still find a PDF version of that chapter lying around somewhere).

You could also look at the source for HashCodeBuilder in Apache Commons Lang, but just don't copy it for class without citing it. It is time well spent to actually learn about this -- it will make you a better person.

dustmachine
  • 10,622
  • 5
  • 26
  • 28
  • This looks what I wanted. To quote from Commons: "This class enables a good hashCode method to be built for any class. It follows the rules laid out in the book Effective Java by Joshua Bloch. Writing a good hashCode method is actually quite difficult. This class aims to simplify the process. " I doubt this will make me a better person (although I care about authors' moral rights) but I hope it will make me a better programmer... – peter.murray.rust Nov 06 '09 at 19:59
  • Unfortunately those links are now dead. :( – Skrylar May 06 '13 at 03:06
2

Though missing the tag, I'll assume you're talking about Java.

One "lazy" solution comes packaged with Eclipse 3.5, which will generate hash codes for you at the push of a button. toString() and equals(), too. Very nice! I suspect you can find similar functionality in IDEA and NetBeans.

Other than that, practically any hash function that consistently generates the same value for the same input will do. This will (probably) only impact the efficiency of stuff like HashMaps.

Carl Smotricz
  • 66,391
  • 18
  • 125
  • 167
  • For some of the common cases, you can look at Sun's JDK code. If I remember correctly, for Strings it just keeps multiplying by 37 and adding in the int value of the next character. For Lists, I think it does something similar, multiplying a result by some prime number and adding (or XOR'ing) the hash code of the next object. – Carl Smotricz Nov 06 '09 at 19:47
2

If you're speaking in regards to defining hashcodes for custom classes, the best bet is to define some sort of mathematical concatenation of all of your fields hashcode functions.

In defining the hashcode, your goal is typically to minimize collisions so if you do something like this, you'll typically be in the clear.

hashcode=(field1.hashcode()*7)+(field2.hashcode()*3)+(field3.hashcode()*51)...
tinkertime
  • 2,972
  • 4
  • 30
  • 45
1

If you're on Windows, you can use HashData().

i_am_jorf
  • 53,608
  • 15
  • 131
  • 222
1

This is a hash code combining function that I use (in C#):

public static int CombineHashCodes(params int[] hashCodes)
{
    unchecked
    {
        var result = 0;
        foreach (var hash in hashCodes)
            result = (result * 397) ^ hash;
        return result;
    }
}

The intuitive reasoning is that the combination aspect is the XOR operator. This is how .NET 4 does it for Tuples:

public static int CombineHashCodes(int h1, int h2)
{
    return ((h1 << 5) + h1) ^ h2;
} 
eulerfx
  • 36,769
  • 7
  • 61
  • 83
0

In any managed environment the raw implementation of an objects hash function is the memory address itself. If you do not care about the properties of an hash function any value will do, as long as some kind of equivalence relation exists between to separate instances that represents the same value.

If your familiar with relational database design, think about your object's primary key? What values constitutes the primary key?

Say that it's a, b and c, well, then your implementation of hashCode would look like this

return a.hashCode() ^ b.hashCode() ^ c.hashCode();

^ is the XOR (exclusive OR) bit-wise operation, this way you can chain together any number of values to form a hash value, and still maintain a decent spread.

John Leidegren
  • 59,920
  • 20
  • 131
  • 152
  • "hash function is the memory address" - Are you sure about this? What about when an object is moved around in memory when it is compacted? – mfeingold Nov 06 '09 at 19:56
  • As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.) - http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html#hashCode%28%29 – John Leidegren Nov 06 '09 at 20:03
  • I took that from the java docs, it's the default implementation and it would imply that new Object().hashCode() != new Object().hashCode() – John Leidegren Nov 06 '09 at 20:04
  • No, distinct objects *can* have identical hash codes, it depends completely on what's coded there. The default for Object is something very simple probably related to the address at which the object is created, but it is definitely in your interest to have new String("abc").hashCode() == new String("abc").hachCode()! – Carl Smotricz Nov 06 '09 at 20:13
  • Looking at that again, we're not in disagreement. Just refining the info. – Carl Smotricz Nov 06 '09 at 20:14
0

To answer your question of what can go wrong: the hashcode generated by your function will be used to find the place for instances of your class if you put it in a hashtable (dictionary/map). If your hashfunction generates too many collisions, the performance of you hashtables can be as bad as O(n).

mfeingold
  • 7,094
  • 4
  • 37
  • 43