4

How does one create a set of sets in Rust? Is it necessary to write an impl block for every concrete type satisfying HashSet<HashSet<_>>?

Minimal failing example:

fn main () {
    let a: HashSet<u32> = HashSet::new();
    let c: HashSet<HashSet<u32>> = HashSet::new();
    c.insert(a);
}

Error:

"insert" method cannot be called on `std::collections::HashSet<std::collections::HashSet<u32>>` due to unsatisfied trait bounds
HashSet doesn't satisfy `std::collections::HashSet<u32>: Hash

Is it possible to override the fact that HashSet is unhashable? I'd like to use a HashSet and need my contents to be unique by actual (memory) equality; I don't need to unique by contents.

Herohtar
  • 5,347
  • 4
  • 31
  • 41
Test
  • 962
  • 9
  • 26
  • 4
    `HashSet` is not hashable since its iteration order is unpredictable. You can use [`BTreeSet`](https://doc.rust-lang.org/std/collections/struct.BTreeSet.html) instead. – apilat Mar 21 '22 at 18:14
  • Does this answer your question? [Why is BTreeMap hashable, and not HashMap?](https://stackoverflow.com/questions/71532357/why-is-btreemap-hashable-and-not-hashmap) – Aiden4 Mar 21 '22 at 18:26
  • Yes. Makes perfect sense. It is possible to "override" the behavior: make two different sets unequal even when they have the same contents? – Test Mar 21 '22 at 18:35
  • I'd like to have a set of sets and want them to be unique by "actual" (memory) equality, not by contents. – Test Mar 21 '22 at 18:41
  • 2
    @Test Rust values are not automatically heap-allocated, so they don't have a stable address you could use as "identity", as is the case in, say, Python. – user4815162342 Mar 21 '22 at 18:42
  • Interesting, where can I learn more? Where do hashsets live if not on the heap—the stack? – Test Mar 21 '22 at 18:46
  • @Test Hash sets allocate their _data_ on the heap, but the object that holds the pointer to the data (and other info, such as capacity etc) is not on the heap, it's allocated inline wherever the HashSet is. Hashing by internal pointer to the data wouldn't work because that pointer will change as the hashset grows. – user4815162342 Mar 21 '22 at 18:53
  • @Test You can create a Python list using `l = [1, 2, 3]` and `id(l)` will keep returning the same number no matter how many elements you append to the list. That's because not only is the vector containing `[1, 2, 3]` heap-allocated, but so is the fixed-size object that holds that pointer, and that keeps track of the length and the capacity of the list (and its reference count and a couple other CPython implementation details). The address of _that_ is what `id(l)` returns, not the address of the inner `[1, 2, 3]` vector. In Rust HashSet contents are on the stack, but the HashSet itself isn't. – user4815162342 Mar 21 '22 at 19:08
  • How can the contents of the hashset be on the stack when there are a variable number of contents? – Test Mar 21 '22 at 19:10
  • @Test Again, the contents are on the heap, but the "root" value that holds that pointer (and the capacity, length, and other metadata) is on the stack. – user4815162342 Mar 21 '22 at 20:08

2 Answers2

3

I'd like to have a set of sets and want them to be unique by "actual" (memory) equality, not by contents.

To do so you first need to box the hashset so that it has a stable memory address. For example:

struct Set<T>(Box<HashSet<T>>);

To make your Set hashable, you'll need to implement Hash and Eq:

impl<T> Set<T> {
    fn as_addr(&self) -> usize {
        // as_ref() gives the reference to the heap-allocated contents
        // inside the Box, which is stable; convert that reference to a
        // pointer and then to usize, and use it for hashing and equality.
        self.0.as_ref() as *const _ as usize
    }
}

impl<T> Hash for Set<T> {
    fn hash<H: Hasher>(&self, state: &mut H) {
        self.as_addr().hash(state);
    }
}

impl<T> Eq for Set<T> {}

impl<T> PartialEq for Set<T> {
    fn eq(&self, other: &Self) -> bool {
        self.as_addr() == other.as_addr()
    }
}

Finally, you'll need to add some set-like methods and a constructor to make it usable:

impl<T: Hash + Eq> Set<T> {
    pub fn new() -> Self {
        Set(Box::new(HashSet::new()))
    }

    pub fn insert(&mut self, value: T) {
        self.0.insert(value);
    }

    pub fn contains(&mut self, value: &T) -> bool {
        self.0.contains(value)
    }
}

Now your code will work, with the additional use of Rc so that you have the original Set available for lookup after you insert it:

fn main() {
    let mut a: Set<u32> = Set::new();
    a.insert(1);
    let a = Rc::new(a);
    let mut c: HashSet<_> = HashSet::new();
    c.insert(Rc::clone(&a));
    assert!(c.contains(&a));
}

Playground

user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • This is very nice, thank you. Does it not require a pin? I broke out the specifically question about Hash/Eq, but now I'm a bit confused. – Test Mar 21 '22 at 23:31
  • Could one create a struct with the hashset as one property and the Box as a separate property? The hashset could be used without a lookup, and the box serve only for equality. – Test Mar 21 '22 at 23:49
  • @Test Re 1st comment: It doesn't require a pin, it's actually the box that makes an object stable in memory. A pin is only needed when you must expose a mutable reference to the box and need to prevent someone from replacing it with another box using `std::mem::replace()` or equivalent. That cannot happen here because the box is private to the object, and it's ok if replacing the whole object changes its identity. – user4815162342 Mar 22 '22 at 07:22
  • @Test Re 2nd comment: yes, but in that case you don't need to box, an ordinary counter will suffice. That approach is covered in [my answer to your other question.](https://stackoverflow.com/a/71564648/1600898) – user4815162342 Mar 22 '22 at 07:23
-1

As pointed out helpfully in the comments, it's not possible to hash sets because they have no fixed address. An effective, if inelegant, solution, is to wrap them in a specialized struct:

struct HashableHashSet<T> {
    hash: ...
    hashset: HashSet<T>
}

And then hash the struct by memory equality.

Test
  • 962
  • 9
  • 26
  • 1
    As also pointed out in the comments (and in the other answer), the `hashset` field needs to be something like `Box>` to make its address stable as `HashableHashSet` gets moved. – user4815162342 Mar 21 '22 at 19:05