A hash function is deterministic, by definition, cf. https://en.wikipedia.org/wiki/Hash_function#Determinism
So if the implementation of hash()
was not deterministic, then it would be a bug, and someone would have noticed!
Caveat: that implementation is subject to change (and bug fixes) hence determinism stands only for a given version of Hive.
Hive is Open Source. Documentation is not bad by Apache standards, but still incomplete. Just inspect the source code => https://github.com/apache/hive
For Hive 2.1 for example:
- the
hash()
function (an UDF in Hive jargon) is defined here
- it just calls
ObjectInspectorUtils.getBucketHashCode()
which calls ObjectInspectorUtils.hashCode()
on each argument, then merges its hash into a global "bucket" hash - as defined here
- a comment shows that the (crude) hashing method implemented by Hive is derived from
String.hashCode()
For alternative hashing functions in Hive, see
Calculate hash without using exisiting hash fuction in Hive but the answer basically points to the same documentation page that you already found.