1

What is the simplest way to implement Hash/Eq on a struct so that two different instances with the same properties will be unequal?

Consider the following struct:

struct Person {
    name: String,
}

What is the simplest way to implement Hash/Eq so that two different people are NOT equal, even if they have the same name? Is the only way to Box<> something?

Test
  • 962
  • 9
  • 26
  • Pretty sure eq would just be `impl PartialEq for Person { fn eq(&self, other: &Self) -> bool { return std::ptr::eq(self, other); } }` – Jared Smith Mar 21 '22 at 19:27
  • 4
    @Jared You could, but what are you going to do with that? You can't, for instance, `insert` into a `HashMap`, because the hash will become wrong simply by moving the `Person` into the map, and lookups will always fail. – trent Mar 21 '22 at 19:29
  • @trent ...or return it from a fn, or almost anything else useful. I see your point, I should have just warned OP off instead. My bad. – Jared Smith Mar 21 '22 at 19:34
  • The only safe way would be to `Pin` the values so they don't move, and that's way more trouble than it's worth. The headline here is "don't do it" no matter how you look at it. – Silvio Mayolo Mar 21 '22 at 20:39
  • 1
    Is it necessary to `pin` the `Box`? I'm a bit confused. – Test Mar 21 '22 at 23:34
  • @Test It depends on whether the non-moving guarantee is exposed to the "outside world" or not. What counts as "outside" is anything not under your direct control that could violate the invariant that a `Person` is never moved out of the `Box` it's in. `Pin` is mostly used with `async` code and `Future`s because the asynchronous code and the executor need to cooperate to ensure that a future is not moved while it is being executed. This is way overcomplicated when all you really want is a notion of "object identity", which Silvio Mayolo's answer provides without getting into any of these weeds. – trent Mar 22 '22 at 14:28

3 Answers3

9

You don't. Rust isn't Java. Two identical Person instances are represented in memory by the exact same sequence of bits and are, thus, indistinguishable. And Box won't even save you here: The Eq instance for Box delegates to the contained value.

The only way to compare for pointer equality in the way you're describing is with std::ptr::eq, using std::pin to ensure that the pointers don't change. But, and I cannot emphasize this enough, this is the wrong approach. You don't want this kind of equality. It doesn't make sense in Rust. If Person { name: "Joe" } and Person { name: "Joe" } are meant to be distinct objects, then your data structure is poorly designed. It's your job to add a distinguishing field. If these are backed by a database, you might use

struct Person {
  primary_key: u64,
  name: String,
}

Or maybe everybody has a hexadecimal employee ID.

struct Person {
  employee_id: String,
  name: String,
}

The point is that the data structure itself (Person in our example) encodes everything about it. Rust eschews the Java-esque notion that every object intrinsically has an identity distinct from all others, in favor of your data explicitly describing itself to the world.

Silvio Mayolo
  • 62,821
  • 6
  • 74
  • 116
  • *And Box won't even save you here...* - To clarify, the boxing technique the OP refers to is to use a `Box` to force a stable address of an object, as described in [this answer](https://stackoverflow.com/a/71562768/1600898) to the OP's recent question. – user4815162342 Mar 21 '22 at 21:09
  • @user4815162342 Based on my understanding, `Box` makes no guarantees about a stable address when the box is moved (i.e. that's what [`std::pin`](https://doc.rust-lang.org/std/pin/index.html) is for). – Silvio Mayolo Mar 21 '22 at 21:11
  • 1
    The address of the inside of the box is stable as long as you don't replace the box with another box. Pin serves to prevent you from using `std::mem::replace()` or equivalent to move data out of the box. If the box is in a private member, as is the case in the `Set` type in the other answer, then it is stable. – user4815162342 Mar 21 '22 at 21:14
6

As pointed out by @SilvioMayolo, value "identity" is not a thing in Rust because Rust's values are not heap-allocated by default. While you can take an address of any value, you can't use it as to represent identity the address changes every time the value is moved to a different variable, passed to a function, or inserted in a container. You can make the address stable by heap-allocating the value, but that requires an allocation when the value is created, and an extra dereference on every access.

For values that heap-allocate their content, such as Strings, you could use the address of the contents to represent identity, as shown in @Smitop's answer. But that is also not a good representation of identity because it changes any time the string re-allocates, e.g. if you append some data to it. If you never plan to grow your strings, then that option will work well. Otherwise, you must use something else.

In general, instead of using an address to represent identity, you can explicitly track the identity as part of the object. Nothing stops you from adding a field representing identity, and assigning it in the constructor:

static NEXT_ID: AtomicU64 = AtomicU64::new(0);

pub struct Person {
    id: u64,
    name: String,
}

impl Person {
    /// Create a new unique person.
    pub fn new(name: String) -> Self {
        Person {
            id: NEXT_ID.fetch_add(1, Ordering::Relaxed),
            name,
        }
    }

    /// Identity of a `Person`.
    ///
    /// Two persons with the same name will still have different identities.
    pub fn id(&self) -> u64 {
        self.id
    }
}

Then you can implement Hash and Eq to use this identity:

impl Hash for Person {
    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
        self.id.hash(state);
    }
}

impl Eq for Person {}

impl PartialEq for Person {
    fn eq(&self, other: &Self) -> bool {
        self.id == other.id
    }
}

// We can't #[derive(Clone)] because that would clone the id, and we
// want to generate a new one instead.
impl Clone for Person {
    fn clone(&self) -> Person {
        Person::new(self.name.clone())
    }
}

Playground

user4815162342
  • 141,790
  • 18
  • 296
  • 355
2

You can compare the String pointers instead of the actual String data. Of course, this method has issue that the Hash of a person will change every execution, but you can't really get around that since there's no extra data in a Person:

use std::{cmp, hash};

impl cmp::PartialEq for Person {
    fn eq(&self, other: &Self) -> bool {
        self.name.as_ptr() == other.name.as_ptr()
    }
}
impl cmp::Eq for Person {}

impl hash::Hash for Person {
    fn hash<H: hash::Hasher>(&self, state: &mut H) {
        self.name.as_ptr().hash(state);
    }
}
smitop
  • 4,770
  • 2
  • 20
  • 53
  • What do you mean by "every execution"? Every time the binary is run? That seems fine. – Test Mar 21 '22 at 23:26
  • @Test Yep, every time the binary is run, since the pointer will be different each time. Also keep in mind that operations that mutate the string to make it larger might cause the string to reallocate and change the pointer. – smitop Mar 21 '22 at 23:37