I have asked similar question for the string.GetHashCode()
method in .NET. Taken from then, I have learned that we cannot rely on the implicit implementation of hash code for the buit-in types, if we are to use it across different machines. Therefore, I am assuming that the Java implementation of String.hashCode()
is also unstable across different hardware configurations and may behave differently across VMs (don't forget different VM implementations)
Currently we are discussing a way to safely transform a string into a number in Java, by hashing, but the hash algorithm must be stable across different nodes of a cluster, and be fast to evaluate, since there will be high frequency of usage. My team mates are insisting on the native hashCode
method, and I'll need some reasonable arguments to make them reconsider another approach. Currently, I can think only of the differences between machine configurations (x86 and x64), possibly different vendors of the JVM on some of the machines (hardly applicable in our case) and byte-order differences, depending on the machine the algorithm is being run. Of course, character encoding is probably to be also considered.
While all these things come into my mind, I am not 100% sure in either of them to be strong reason enough, and I'd appreciate your expertize and experience in this area. This will help me build stronger arguments to favor writing a custom hashing algorithm. Also, I'd appreciate advices on what not to do when implementing it.