2

Coming from Python here.

I'm wondering why a BTreeMap is hashable. I'm not surprised a Hashmap isn't, but I don't understand why the BTreeMap is.

For example, I can do that:

let mut seen_comb: HashSet<BTreeMap<u8, u8>> = HashSet::new();
seen_comb.insert(BTreeMap::new());

But I can't do that:

let mut seen: HashSet<HashMap<u8, u8>> = HashSet::new();
seen.insert(HashMap::new());

Because I'm getting:

error[E0599]: the method `insert` exists for struct `HashSet<HashMap<u8, u8>>`, but its trait bounds were not satisfied
   --> src/main.rs:14:10
    |
14  |     seen.insert(HashMap::new());
    |          ^^^^^^ method cannot be called on `HashSet<HashMap<u8, u8>>` due to unsatisfied trait bounds
    |
   ::: /home/djipey/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/collections/hash/map.rs:209:1
    |
209 | pub struct HashMap<K, V, S = RandomState> {
    | ----------------------------------------- doesn't satisfy `HashMap<u8, u8>: Hash`
    |
    = note: the following trait bounds were not satisfied:
            `HashMap<u8, u8>: Hash`

In Python, I can't put a dict inside a set, so the BTreeMap behaviour is surprising to me.

Could someone provide an explanation here?

Herohtar
  • 5,347
  • 4
  • 31
  • 41
JPFrancoia
  • 4,866
  • 10
  • 43
  • 73
  • This is just a guess, but the order in which elements are hashed affects the results and `HashMap` doesn't have a deterministic order. Even if two `HashMap`s have the same elements, the order can be different. – kmdreko Mar 18 '22 at 20:03

2 Answers2

6

The reason is that BTreeMap has a deterministic iteration order and HashMap does not. To quote the docs from the Hash trait,

When implementing both Hash and Eq, it is important that the following property holds:

k1 == k2 -> hash(k1) == hash(k2)

In other words, if two keys are equal, their hashes must also be equal. HashMap and HashSet both rely on this behavior.

There is no way to guarantee this behavior since the iteration order of HashMap is non-deterministic, so data would be fed to the Hasher in a different order whenever a different HashMap is inputted, even if they contain the same elements, breaking the contract of Hash and causing bad things to happen when used in a HashSet or HashMap.

However, the entire point of the BTreeMap is that it is an ordered map, meaning iteration over it occurs in sorted order, which is fully deterministic, meaning the contract of Hash can be satisfied, so an implementation is provided.

Note that both of these behaviors are different than Python's Dict, which iterates over things in order of insertion.

Aiden4
  • 2,504
  • 1
  • 7
  • 24
  • Ok, makes sense, thanks. I almost asked why Python dicts aren't hashable then (a hashable object in python must meet the same criteria), but your comment on the insertion order cleared that up. In Python since the dicts are ordered by insertion order, I could have 2 dicts with the same content but with a different iteration order – JPFrancoia Mar 19 '22 at 11:57
0

Actually, the answer for implementing Hash for HashMap is there: how-to-implement-a-hash-function-for-a-hashset-hashmap

Here is an example of code (playground):

use std::collections::{ HashSet, HashMap, hash_map::DefaultHasher, };
use core::hash::{ Hasher, Hash, };

#[derive(Debug, PartialEq,Eq)]
struct MyMap<K,V>(HashMap<K,V>) where K: Eq + Hash;

impl<K,V> Hash for MyMap<K,V> where K: Eq + Hash, V: Hash, {
    fn hash<H>(&self, h: &mut H) where H: Hasher { 
        let hasher = DefaultHasher::new();
        let commut_mix = self.0.iter().map(|(k,v)| {
            let mut in_h = hasher.clone();
            k.hash(&mut in_h);
            v.hash(&mut in_h);
            in_h.finish() as u128
        }).sum();
        h.write_u128(commut_mix);
    }
}

fn main() {
    let map1 = MyMap([(0,"zero"), (2,"deux"),].into_iter().collect());
    let map2 = MyMap([(1,"un"), (3,"trois"), ].into_iter().collect());
    let map2bis = MyMap([(1,"un"), (3,"trois"), ].into_iter().collect());
    let map3 = MyMap([(0,"zero"), (1,"un"), (2,"deux"), ].into_iter().collect());
    let map4 = MyMap([(0,"zero"), (2,"deux"), (3,"trois"), ].into_iter().collect());
    let set: HashSet<_> = [map1, map2, map3, map4, map2bis].into_iter().collect();
    println!("set -> {:?}", set);
}

with result:

Standard Error

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 5.32s
     Running `target/debug/playground`

Standard Output

set -> {MyMap({1: "un", 0: "zero", 2: "deux"}), MyMap({0: "zero", 2: "deux"}), MyMap({3: "trois", 0: "zero", 2: "deux"}), MyMap({1: "un", 3: "trois"})}
FreD
  • 393
  • 2
  • 12
  • Note that with enough elements (or specific big hashes), `.sum()` will panic in debug mode (or release with checks enabled), due to overflow. I don't think there's a `wrapping_sum`, so your best bet would be to sum manually, using `.wrapping_add`. – Filipe Rodrigues Jan 14 '23 at 11:47
  • Also I see no reason to clone `hasher` to create `in_h`, as opposed to just initializing `in_h` with a `DefaultHasher::new` – Filipe Rodrigues Jan 14 '23 at 11:47
  • finish() produces u64, but sum is applied to u128. So it should not overflow. – FreD Jan 15 '23 at 07:53
  • I agree with you on clone(). I hesitated between the two implementations. But I wanted to make it clear that the hasher should be in the same state. – FreD Jan 15 '23 at 08:02