This is a slight variation on how to combine two hashes in that I would like the resulting hash to be influenced more by one of the input.
For the roughly symmetric case, we have algorithms such as boost::hash_combine:
template <class T>
inline void hash_combine(std::size_t& seed, const T& v)
{
std::hash<T> hasher;
seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
I am looking for a weighted version, perhaps the interface would resemble:
uint64_t weighted_hash_combine(uint64_t hashA, uint16 weightA, uint64_t hashB, uint16 weightB);
The premise being that the probability of a bit in the output hash being affected by changes in one of the input hashes is a function of the ratio of weightA to weightB.
This would allow me to improve on a tree hashing algorithm for unbalanced trees. A simpler way to hash a tree is covered here, essentially a breadth first traversal pushes each hash(node) into an accumulated value. The problem with this is that the last node to be mixed into the combined hash will have a greater influence over the result than the first.
If a reasonable weighted hash combination is available, then I can bias the combination based on the number of nodes that contributed to each hash and, hopefully, improve the fairness of the hash function.
So far I've come up with:
uint64_t weighted_hash_combine(uint64_t hashA, uint16 weightA, uint64_t hashB, uint16 weightB)
{
if (weightA > weightB)
{
return weighted_hash_combine(hashB,weightB,hashA,weightA);
}
uint64_t ratio = weightA / weightB;
uint64_t combined = hashA;
for (uint64_t i = 0; i < ratio; i++)
{
hash_combine(combined, hashB);
}
return combined;
}
This is rather lacking in numerical sophistication though, so I'm hoping the community can recall / invent a better solution.
The high level goal is to short-circuit an equality test between trees when the (size or) hash values are different, given that they will often only differ in one or two leaves and there's no good way to estimate which.