16

With a Regular Type, I mean the definition of Stepanov in Elements of Programming, basically, that there's the concept of equality and that objects which are copies of each other compare equal.

So when you have a Regular Type T, and the equality relation is transitive (a == b && b == c => a == c), you can define a (non-trivial) hash function which is consistent with the definition of equality (a == b => h(a) == h(b)). Always.

But the standard doesn't include many std::hash specialisations. E.g. std::complex doesn't have one, and neither have the containers, with the notable exceptions of vector<bool> and bitset.

So I'm wondering what the design principle is here.

Or, asked differently: Are there reasons not to provide std::hash specialisations for your own types, provided they are regular and equality is transitive?

mike
  • 4,929
  • 4
  • 40
  • 80
Marc Mutz - mmutz
  • 24,485
  • 12
  • 80
  • 90
  • 1
    Sure, you can always define a hash function which is consistent: `size_t operator()(T& const) const { return 0; }` The question is, can you always define one that is good for an arbitrary type? – Barry Apr 30 '15 at 13:45
  • `vector` is not implemented as `vector` of `bool`s. It's a template specialization that relies on `int` to save several `bool`s (I assume 32). It's invariant to any template parameters (and the underlying types' `std::hash`), I think that's why a specialization is provided. – mike Apr 30 '15 at 13:48
  • There are a number of types in the standard library that have [standing requests open](https://cplusplus.github.io/LWG/lwg-active.html#1025) to specialize `std::hash` – Cory Kramer Apr 30 '15 at 14:03
  • *"So when you have a Regular Type T, and the equality relation is transitive [...]"* This sounds like a lemma. Care to elaborate? I've read EoP, but I can't remember anything in it that proves it. – dyp Apr 30 '15 at 14:54
  • @dyp: No hashing in EoP, but if equality is an equivalence relation (which includes transitivity), then a hash function can be constructed that maps each equivalence class into a natural number. QED. But you don't actually need that equality is an equivalence relation. E.g. equality for floating-point types is not reflexive (*nan != nan*), yet a consistent hash function exists (`auto h(auto x) { return x == 0 ? 0 : hash_bits(&x, sizeof x); }`). But when you drop transitivity, then you cannot define a consistent hash function (at least in the cases I saw, e.g. `a == b <=> abs(a-b) < 1e-6`). – Marc Mutz - mmutz Apr 30 '15 at 19:07
  • I don't quite see why the set of equivalence classes should be countable. For a finite set of values, this is obvious, but I don't think that's required for regular types. Maybe the countability of the set of values is a property of the representation? Additionally, even if it is possible to enumerate the equivalence classes, it might be impossible to find them. For example, if you write a wrapper type that wraps some opaque API which only exposes certain functions (the regular computational basis) for opaque handle types. – dyp Apr 30 '15 at 20:15
  • The set of equivalence classes is finite, thus countable, because your RAM is finite. Eo*P*, not Eo*infiniteVectorSpaces*. As for proxies: sure, if you want to hash proxies, you need to put a hash function into its computational basis. But the same is true for equality, and I assume equality exists (*regular* type). As a constructive approach: if you hash the subobjects which you compare for equality in the equality operation, possibly recursively, and in the same order (to handle `unordered_set`), then you get a consistent hash function. – Marc Mutz - mmutz Apr 30 '15 at 22:36

2 Answers2

2

Yes.

When a type has the following two properties I do not think you should define std::hash:

  • There is no efficient way to consistently create a quality hash that covers all the data used to describe equality.

  • There is no efficient and/or intuitive way to select a consistent subset of data to hash.

Community
  • 1
  • 1
orlp
  • 112,504
  • 36
  • 218
  • 315
  • Actually, `std::hash` can be implemented in *linear time*, whereas `operator==` has worst-cast *quadratic time*. Just hash individual items and use a commutative hash combiner (unsigned addition, *not* xor). – Marc Mutz - mmutz Apr 30 '15 at 14:18
  • @MarcMutz-mmutz Never mind my first comment - if the hash function creates uniform values addition is fine. I'll see if I can find a better example type - I think my properties are fine though. – orlp Apr 30 '15 at 14:22
0

Providing specialization for own types makes no sense, if they're template classes, since the hash function (with high a very high probability) also depends on the template parameters' types, which are unknown.

It there are no template parameters, or the template parameters are restricted to certain types

// provide no implementation to restrict all types
template<typename T> container;

// int is allowed
template<> container<int> {
    ...
}

// double is allowed
template<> container<double> {
    ...
}

providing a specialization of std::hash is possible, since the implementations of the classes (or the instanced template classes) are known, as it is for vector<bool> contrary to complex<T>.

mike
  • 4,929
  • 4
  • 40
  • 80
  • What's wrong with `namespace std { template struct hash> { using result_type = size_t; using argument_type = complex; result_type operator()(const argument_type &a) const { std::hash hash; size_t result = hash(a.real()) + 0x9e3779b9; result ^= hash(a.imag()) + 0x9e3779b9 + (result << 6) + (result >> 2); return result; } }; }`? – Marc Mutz - mmutz Apr 30 '15 at 14:14
  • the line `std::hash hash;` requires that there is a `std::hash` specialization for `T`, it uses this specialization to create a specialization for `complex`, but how can one know that this would yield an effective implementation? – mike Apr 30 '15 at 14:24
  • `complex` is only allowed to be instantiated with `float`, `double` and `long double`. Even if it wasn't, failure to provide a `hash` when requesting instantiation of `hash>` would just yield the expected error. No problem there. And the combiner (with the magic number) is a pretty good one, from Boost, regardless of the types involved. – Marc Mutz - mmutz Apr 30 '15 at 14:27
  • Sry, I rather meant efficient. Of course you're proposal is effective, at least as effective as `hash`. – mike Apr 30 '15 at 14:33
  • 1
    Why would a `hash>` be _at least as efficient_ as `hash`? That makes no sense. It should be roughly 2x slower, a bit more to allow for the combining operation, a bit less to allow for instruction-level parallelism. Likewise, whether I define my `hash>` generically or whether I define three `hash>`, `FP` in {`float`, `double`, `long double`}, there should be no difference in efficiency, should there? – Marc Mutz - mmutz Apr 30 '15 at 19:17