3

So I was curious when I got to know that dictionaries or associative arrays are usually implemented by hash tables. Upon reading about hash tables, I stumbled upon hash functions, I learned there are various hash functions such as md5, md6, sha-1 etc. What I was unable to find was which hash function is used by programming languages such as python, C++, java?

Harris
  • 98
  • 2
  • 11
  • Those are.. not the same kind of 'hash function' D: – user2864740 Aug 26 '18 at 22:46
  • 1
    On the the other hand, a [*cryptographic hash*](https://en.wikipedia.org/wiki/Cryptographic_hash_function) is designed to provide much stronger cryptographic requirements and output space. – user2864740 Aug 26 '18 at 22:49
  • They are cryptographic hash functions I know, but I want to know which ones programming languages use? Do they have any name or any identity like cryptographic HF? – Harris Aug 26 '18 at 22:49
  • 1
    A programming language doesn't "use a cryptographic function", a program does. Languages *support* different functions (often implemented via libraries). – user2864740 Aug 26 '18 at 22:50
  • The programming language (e.g. [Go](http://golang.org/)...) specification won't mandate a *particular* hash function. Some particular *implementation* might have its own one. Many programming languages implementations are free software, so you could dive into their source code and study it – Basile Starynkevitch Aug 26 '18 at 22:51
  • @user2864740 I was referring to the associative arrays that are an important aspect of most programming languages. – Harris Aug 26 '18 at 23:10

1 Answers1

2

Those are.. not the same kind of 'hash function' D:

For hashtable hash functions, code must compute an appropriate hash based on object-data such that it conforms to equality requirements. It should also be "well distributed" and "fast". Most hashtable hashes are thus often 32-bit values using some form of a rolling/shifting computation. At the end of the day this hash is used to select from a much smaller pool of buckets.

Hashtable hashes are usually computed directly by (or with knowledge of) the objects be added to the hashtable - that is, generally, cryptographic hash functions are not involved in hashtables. A typical Java hashCode() function, defined on the object being added to the hashtable, for example might look like:

int hash = 7;
hash = 31 * hash + (int) int_field;
hash = 31 * hash + (str_field == null ? 0 : str_field.hashCode());
// etc.
return hash;

There are discussions on the choice of seed and multiplication values elsewhere.. but the take-way should be that most hashtable hash functions 1) directly derive from object state, applying 'tweaks' as prudent, and 2) are not designed to be "secure".

(Modern hashtable implementations often apply a "mixing function" to the generated hash value to mitigate degenerate hash function results and/or data poisoning attacks.)

On the the other hand, a cryptographic hash is designed to provide much stronger cryptographic requirements and have a much larger output space. While such a strong hash can be used for hashtables (after being derived from an object and then distilled down to a hash bucket), they are also slower to generate and usually unnecessary in context of a hash/dictionary.

Cryptographic hashes generally work on an arbitrary chunk of data or byte stream.

Hashtable hash desirable characteristics:

  • Deterministic
  • Uniform distribution / avoidance of clustering
  • Speed, speed, speed

Cryptographic hashes have additional characteristics, beyond that of a hashtable hash:

  • Infeasible to generate a message from its hash value
  • Infeasible to find two different messages with the same hash value
  • (While cryptographic hashes should also be fast, speed is largely secondary to the additional requirements.)

Programming languages support a wide range of different cryptographic hash functions through their standard libraries and/or 3rd party libraries. A more well-known hash (eg. MD5/SHA-x) will generally have universal support while something more specialized (eg. MD6) may require additional effort to locate an implementation for.

On the other hand, as shown above, many hash table 'functions' are implemented directly on the object(s) involved in a hashtable, following a standard pattern, with some languages (and IDEs) providing assistance to reduce manual coding. As an example, C# provides a default reflection-based GetHashCode implementation for struct types.

user2864740
  • 60,010
  • 15
  • 145
  • 220
  • 1
    Thanks, that clears a lot. I also wanted to figure out that there core functionality is essentially the same right? only the complexity that makes the difference. Am I correct? – Harris Aug 26 '18 at 23:14
  • Both kinds of hashes share similar base properties - ie. they must be deterministic, ought to be uniformly distributed, and should be fast (fsvo); however, cryptographic hashes have *additional* requirements that can be skipped/ignored in most basic hashtable hash usages. ("Security vulnerabilities" arise when a cryptographic hash is no longer able to meet those requirements - ie [MD5 is "cryptographically broken and unsuitable for further use"](https://en.wikipedia.org/wiki/MD5); much larger impact..) – user2864740 Aug 26 '18 at 23:16
  • yes, and additional complexity (or larger pool) for cryptographic hash also insures that the probability of collisions become negligible which isn't that much of an issue with dictionary hash. – Harris Aug 26 '18 at 23:23
  • For a hashtable, there are a "relatively small set of buckets", so it's the effective *distribution* that's more relevant than 'less collisions in the originally huge but folded/truncated output space'. Cryptographic hash functions are expected to be well-distributed per other goals, but such does not imply that an appropriate non-cryptographic hash is not *sufficiently* well-distributed.. some hashtable implementations will also perform *hash mixing* to improve distribution. – user2864740 Aug 26 '18 at 23:26