Efficiently hashing a tree

Question

I am frequently searching a large forest for many different tree fragments , and would like to have the "equality check" part of the search occur faster than O(N) time. The idea is to "hash" tree fragments and treat collisions as equality.

My tree structure is a standard Node object with an int value field and a children field. I would like the hash value of a tree rooted at a certain node to depend only on it's children, and that given a child's hashcode has changed, a relatively cheap update can be applied to the hashcode of the child's parent. If we can achieve an O(1) update operation, then a value change at a leaf node will result in a O(log n) chain reaction up the tree.

I have read the following question: Hashing a Tree Structure

which gives a great answer. However, I am having implementation troubles with the algorithm. As strings are immutable in Java, an update in the child-node would translate to an O(N) hashcode update operation in the parent. Please correct me here if it seems I've misunderstood the answer above.

What is a possible hash function for what I'm looking for?

`given a child's hashcode has changed, a relatively cheap update can be applied to the hashcode of the child's parent` Would recommend not having objects ever change their hash codes. — Kon, Jan 30 '17 at 20:55

score 0 · Answer 1 · answered Jan 30 '17 at 23:18

If I understood you correctly, you do not need hash if you just want to check equality of two trees. Yes! you would need a good hashing function if you are putting your tree in some collection data structure and trying to access them with some keys. "The idea is to "hash" tree fragments and treat collisions as equality" This is not correct statement and you should not do that. Same hash code does not imply equality in general. The concept of equality has nothing to do with hashing of function. Hashing is done for faster retrieval/access of elements in some collection like hashmap or hashtable.

score 0 · Answer 2 · answered Jan 31 '17 at 20:35

I find question very interesting. We can try to create an unique integer for each node which is calculated by the sub-tree below that node. It won't be helpful to you as you have forest not binary tree. But in case of binary trees, I think weighing left and right child differently would be helpful avoiding same hash for mirror trees.

class Node{ 
    int value;
    List<Node> children;

    @Override
    public int hashCode(){
         int childCount = children.size();
         int sum = 0;
         for Node child : children:
             sum += child.hashCode();
         int hashCode = childCount * 2^31 + value * 2^23 + sum * 2^13;

         return hashCode;
    }
}

I feel this will give you the required hash function. This might not be collision free hash. It will rule out most of the non-equal sub-trees. Once you get same hash for sub-trees you can perform equality check to confirm. This will definitely reduce most of the computations.

is there a way to take into account the order of the children in the hash? such that if children appear in ABC, the node hashes differently than if the children appeared in BCA — user3605508, Jan 31 '17 at 21:52
Yes absolutely. Adding an additional weight for each children while calculating sum will produce different hash for children 'ABC' and 'BCA'. — Anwar Shaikh, Jan 31 '17 at 21:56

Efficiently hashing a tree

2 Answers2