Un-overiding hashCode

Question

I have the following situation: I have many BSTs, and I want to merge isomorphic subtrees to save space.

I am hashing Binary Search Tree nodes into a "unique table" - basically a hash of BST nodes.

Nodes that have the same left and right child and the same key have the same hash code, and I have overridden equals for the node class appropriately.

Everything works, except that computing the hash is expensive - it involves computing the hash for the child nodes.

I would like to cache the hashed value for a node. The problem I have is the natural way of doing this, a HashMap from nodes to integers, will itself call the hash function on the nodes.

I've gotten around this by declaring a new field in the nodes, which I use to store the hash code. However, I feel this is not the right solution.

What I really want is to to map nodes to their hash codes using a hash which uses the node's address. I thought I could do this by making HashMap, and casting the nodes to object, which would then invoke the hashCode method on objects, but this didn't work (inserts into the hash still call the node hash and equality functions.

I would appreciate insight into the best way of implementing the node to hash code cache. I've attached code below illustrating what's going on below.

import java.util.Set;
import java.util.HashSet;
import java.util.Map;
import java.util.HashMap;

class Bst {

  int key;
  String name;
  Bst left;
  Bst right;

  public Bst( int k, String name, Bst l, Bst r ) {
    this.key = k;
    this.name = name;
    this.left = l;
    this.right = r;
  }

  public String toString() {
    String l = "";
    String r = "";
    if ( left != null ) {
      l = left.toString();
    }
    if ( right != null ) {
      r = right.toString();
    }
    return key + ":" + name + ":" + l + ":" + r;
  }

 @Override
  public boolean equals( Object o ) {
    System.out.println("calling Bst's equals");
    if ( o == null ) {
      return false;
    }
    if ( !(o instanceof Bst) ) {
      return false;
    }
    Bst n = (Bst) o;

    if ( n == null || n.key != key ) {
      return false;
    } else if ( n.left != null && left == null || n.right != null && right == null ||
                n.left == null & left != null || n.right == null && right != null ) {
      return false;
    } else if ( n.left != null && n.right == null ) {
      return n.left.equals( left );
    } else if ( n.left != null && n.right != null ) {
      return n.left.equals( left ) && n.right.equals( right );
    } else if ( n.left == null && n.right != null ) {
      return n.right.equals( right );
    } else {
      return true;
    }
  }

  @Override
  public int hashCode() {
    // the real hash function is more complex, entails
    // calling hashCode on children if they are not null
    System.out.println("calling Bst's hashCode");
    return key;
  }
}

public class Hashing {

  static void p(String s) { System.out.println(s); }

  public static void main( String [] args ) {
    Set<Bst> aSet = new HashSet<Bst>();
    Bst a = new Bst(1, "a", null, null );
    Bst b = new Bst(2, "b", null, null );
    Bst c = new Bst(3, "c", null, null );
    Bst d = new Bst(1, "d", null, null );

    a.left = b;
    a.right = c;
    d.left = b;
    d.right = c;

    aSet.add( a );
    if ( aSet.contains( d ) ) {
      p("d is a member of aSet");
    } else {
      p("d is a not member of aSet");
    }

    if ( a.equals( d ) ) {
      p("a and d are equal");
    } else {
      p("a and d are not equal");
    }

    // now try casts to objects to avoid calling Bst's HashCode and equals
    Set<Object> bSet = new HashSet<Object>();
    Object foo = new Bst( a.key, a.name, a.left, a.right );
    Object bar = new Bst( a.key, a.name, a.left, a.right );
    bSet.add( foo );
    p("added foo");
   if ( bSet.contains( bar ) ) {
      p("bar is a member of bSet");
    } else {
      p("bar is a not member of bSet");
    }
  }
}

score 2 · Answer 1 · answered Aug 24 '11 at 14:48

2

Storing the hash in a field in the node feels like exactly the right solution to me. It's also what java.lang.String uses for its own hash code. Aside from anything else, it means that you can't possibly end up with cache entries for objects which can otherwise be collected, etc.

If you really want the value of hashCode that would be returned by the implementation in Object, you can use System.identityHashCode though. You shouldn't rely on this - or any other hash code - being unique though.

One other point: your tree is mutable at the moment by virtue of the fields being package access. If you cache the hash code the first time you call it, you won't "notice" if it would have changed due to fields changing. Basically you shouldn't change a node after you've used its hash code.

answered Aug 24 '11 at 14:48

Jon Skeet

1,421,763
867
9,128
9,194

thanks for the clarifications. i am less concerned with the entries that may be garbage collected, since i can write my own memory manager and sweep appropriately. the code i wrote is just for illustration, and i appreciate your points about immutability. – adnan Aug 24 '11 at 15:01
Jon presents a valid warning not to change the a node after you've used its hash code. This is true even if you are not caching the hashcode, as in that case you will be changing the hashcode of an existing object, thus making it unreachable to any containers which use the hashcode to lookup the object. To quote [this C# article](http://blogs.msdn.com/b/ericlippert/archive/2011/02/28/guidelines-and-rules-for-gethashcode.aspx) by Eric Lippert: The [`hashCode`] must never change while the object is contained in a data structure that depends on the hash code remaining stable. – Brian Aug 24 '11 at 15:57
Yes, that article is not about Java. But the rules it lists apply equally well to Java's `hashCode`. – Brian Aug 24 '11 at 15:59

Simone Gianni · Answer 2 · 2011-08-24T15:13:19.627

2

storing the hash in a field can actually be equivalent to "caching" the value so that it does not have to be recomputed too frequently.

It's not necessarily a bad practice, but you have to make sure that you are clearing/recomputing it correctly whenever there is a change, which can be daunting if you have to notify of a change up or down a complex graph or tree.

If you want to use a hash code computed by the JVM (roughly based on the "RAM address" of the object, even if it's value is implementation specific), you can use System.identityHashCode(x), which does exactly that, and exactly what Object.hashCode does.

edited Aug 24 '11 at 15:13

answered Aug 24 '11 at 14:48

Simone Gianni

11,426
40
49

thanks, as the previous poster mentioned, there are issues around mutability, but for my application, i am just using the trees for lookup after they are built, so no updates. the next poster pointed out that objects can move around, so would System.identityHashCode(x) would lead to a broken cache. – adnan Aug 24 '11 at 15:03
`System.identityHashCode()` may or may not return the object's actual memory address (its computation is implementation specific). Just think about what happens when the GC internally relocates an object... – Péter Török Aug 24 '11 at 15:07
Hi Péter, you're absolutely right, however since he used the word "address" it was easier to put it this way that to introduce him to full garbage collection design. I'll clarify the answer. – Simone Gianni Aug 24 '11 at 15:12

score 2 · Answer 3 · answered Aug 24 '11 at 14:52

2

Java's built-in IdentityHashMap does what you're describing.

That said, Jon Skeet's answer sounds more like the right way to go.

answered Aug 24 '11 at 14:52

Michael Brewer-Davis

14,018
5
37
49

just tried IdentityHashMap out, works perfectly! i agree though that keeping the code as a node field is a lot simpler and faster to boot, will stick to that. – adnan Aug 24 '11 at 15:10

Péter Török · Answer 4 · 2011-08-24T14:51:06.957

What I really want is to to map nodes to their hash codes using a hash which uses the node's address.

What do you mean by the node's address? There is no such concept in Java, and there is no unique identifier for objects that I know of, like the physical address in non VM based languages e.g. C++. References in Java are not memory addresses, and objects may be relocated in memory anytime by the GC.

I thought I could do this by making HashMap, and casting the nodes to object, which would then invoke the hashCode method on objects, but this didn't work

Indeed, since hashCode is virtual, and is overridden in your node class, so always the subclass implementation will be called, regardless of the static type of the reference you have.

I am afraid any attempt to use a map to cache hash values bumps into the same chicken and egg problem, that - as you mention - the map needs the hash value itself first.

I don't see any better way than caching the hash values within the nodes as you did. ~~You need to ensure though that the cached values are invalidated whenever the child nodes change.~~ Wrong - as Jon's answer points out, changing the hashcode of an object after it is stored in a map breaks the map's internal integrity, so it must not happen.

As you may have guessed, I come from a C/C++ background, hence the use of pointer. I was worried about the objects moving around - however, Object's own hashCode (http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Object.html#hashCode()) is based on object addresses (going by this discussion, http://stackoverflow.com/questions/2237720/what-is-an-objects-hashcode) so wouldn't this be a problem in for any hash or set of Objects? — adnan, Aug 24 '11 at 14:55
@user453026, see [How does the JVM ensure that System.identityHashCode() will never change?](http://stackoverflow.com/questions/1063068/how-does-the-jvm-ensure-that-system-identityhashcode-will-never-change) (this was new to me too, I am learning it right now :-) — Péter Török, Aug 24 '11 at 15:10

Un-overiding hashCode

4 Answers4