3

I would like to hash a large number of int[] in Java. My arrays contain paths from the root to nodes in a tree (nodes are numbered), thus they are very similar on the first values (as all paths come from the root).

I am using Google Guava Hasher, I add each value of my array to the hasher to get the hashCode :

HashFunction hashFunction = Hashing.murmur3_128();
Hasher hasher = hashFunction.newHasher();
for (int i: myArray) {
    hasher.putInt(i);
}
int hashCode = inthasher.hash().asInt();

I would like to avoid hashing the whole array again for every path and only hash the last values by adding the last values to a copy of my hasher. Like this :

anotherHasher = hasher.clone();
anotherHasher.putInt(someInt);
int hashCode = hasher.hash().asInt();
int anotherHashCode = anotherHasher.hash().asInt();

But the clone method doesn't exist for Hasher.

Does this kind of copy would save computation time? Is it possible to implement Cloneable with a wrapper containing a Hasher even if the later is not cloneable? If yes how? Is there another way?

EDIT: For the records, what takes times in hashing with Guava is not adding elements with hasher.putInt(someInt) but the hashing itself called at the end with hasher.hash(). So even deep copying was fast (which is not, see the answer by Tomasz Linkowski), my approach is not relevant and does not deserve to be studied further.

Baptiste Merliot
  • 841
  • 11
  • 24
  • I think clone method will not be supported, because it is very difficult to decide whether an Array was modified or not.. or in fact, its state is not constant. – Anand Vaidya Oct 08 '18 at 13:39
  • 2
    Why is it important to know if the Array was modified? If I cloned the Hasher I want it to be independant from the previous one, so why should I care if it is modified (or the Array it stems from)? – Baptiste Merliot Oct 08 '18 at 13:43
  • Or do you mean some kind of lock over the Hasher during copy is hard to implement? – Baptiste Merliot Oct 08 '18 at 13:46
  • best way to do it is override the .deepClone() and set your own parameteres on the cloning, as the classes go deeper. – Sir. Hedgehog Oct 08 '18 at 13:53

1 Answers1

2

This can be done with help of a library that performs a deep copy of an object (here are some deep copy library recommendations).

One such library is Kryo, with which this can be achieved by:

  1. Instantiating Kryo (note that creation of this object is quite expensive + it's not thread-safe):

    Kryo kryo = new Kryo();
    kryo.setInstantiatorStrategy(new StdInstantiatorStrategy());
    
  2. Calling Kryo.copy(T):

    Hasher anotherHasher = kryo.copy(hasher)
    

However, note that this might not be faster than hashing twice because:

  • Kryo's StdInstantiatorStrategy uses Objenesis to instantiate the Hasher implementations (and I don't know how fast it is);
  • and the recommended DefaultInstantiatorStrategy cannot be used because Hasher implementations do not have a no-arg constructor (see Kryo's readme about InstantiatorStrategy for more details).

I'd recommend benchmarking this using Java Microbenchmark Harness (note various JMH plugins at the end of the linked page).

Tomasz Linkowski
  • 4,386
  • 23
  • 38
  • 1
    Thanks for the answer, I found a way around without deep copying, as you said this would not be the fastest solution. – Baptiste Merliot Oct 09 '18 at 15:18
  • 1
    @BaptisteMerliot Thanks for accepting but if you found a better solution, you might want to post it as an answer and accept it instead so that others can find the best solution to this question. – Tomasz Linkowski Oct 09 '18 at 19:30
  • 1
    The solution was to change my global algorithm and not just this part, so it becomes off-topic for this question. – Baptiste Merliot Oct 10 '18 at 07:07