2

I have a small number of instances of a custom class stored in a set. I need to check if a certain element is contained in that set. The criteria for a match must be the object's ID, not its content.

For simplification assume a class with an integer var as the only property, and two different instances of that class, both holding the number 1.

Directly comparing those instances should return true, but when a reference to the first is stored in the set, a query if the set contains a reference to the second one should return false.

Therefore I use the ObjectIdentifier of the object to generate the hash function required by the hashable protocol.

It is my understanding that the .contains method of a Swift Set uses the hash value first, and in case of hash collisions the equatable method is used as a fallback.

But in the following code, which can run in a playground, I get randum results:

class MyClass: Hashable {
    var number: Int
    init(_ number: Int) {
        self.number = number
    }
    static func == (lhs: MyClass, rhs: MyClass) -> Bool {
        return lhs.number == rhs.number
    }
    func hash(into hasher: inout Hasher) {
        hasher.combine(ObjectIdentifier(self))
    }
}

var mySet: Set<MyClass> = []

let number1 = MyClass(1)
let secondNumber1 = MyClass(1)

number1 == secondNumber1        // true: integer values are equal, so are the wrapping classes
number1 === secondNumber1       // false: two different instances

mySet.insert(number1)

mySet.contains(number1)         // true
mySet.contains(secondNumber1)   // should be false but randomly changes between runs

If you run the above code in an XCode Playground and manually restart playground execution this gives different results for the last line on each run. The desired behaviour is to get "false" every time.

So what would be the correct way to achieve the described bahaviour?

MassMover
  • 529
  • 2
  • 17
  • Without spending a lot of time looking at it. I'm wondering if, instead of `func hash(into hasher: inout Hasher) {` you should override `hash` directly and return `ObjectIdentifier(self).hash` instead. Having said that, why not use `hasher.combine(number)` instead? – MadProgrammer Sep 02 '19 at 23:20
  • From the documentation for `Hashable`: *"Two instances that are equal must feed the same values to Hasher in hash(into:), in the same order."*. You need a different solution than `Hashable` or `Set`. – rmaddy Sep 02 '19 at 23:38
  • When I use `code`hasher.combine(number)`code` I would get the same hash value for identical numbers, but as I wrote in the first paragraph I want to check the set.contains method for identity, not equality. – MassMover Sep 02 '19 at 23:39
  • @ rmaddy OK, but have you got any hint to what that solution could be? – MassMover Sep 02 '19 at 23:43
  • Look into the `NSHashTable` class. You can create one based on object identity. – rmaddy Sep 02 '19 at 23:48
  • Ok, thank you, I will look into that. First try in the playground code seems to work by using `var mySet = NSHashTable()` instead of Set. But, of course, my actual code is more complex, so I'll have to dig deeper to find out if there are any caveats … – MassMover Sep 03 '19 at 00:09
  • After further examination it turns out, that NSHashTable does not conform to the Sequence protocol, so it is not possible to iterate over its items. – MassMover Sep 05 '19 at 22:16

1 Answers1

2

Simply put, Set relies on func hash(into hasher: inout Hasher) and ==. It is invalid to have an unmatched pair of these. In your case, your equality is value-based (dependant upon self.number), whereas your hash is identity based. This isn't legal.

Your mySet.contains(secondNumber1) line is failing because secondNumber2 might have a hash collision with number1. Whether a collision occurs or not is undefined, because Swift uses a random seed to defend against hash-flood DDoS attacks. If a hash collision does occur, then your equality operator (==) falsely identifies as number1 as a match for secondNumber1

Instead, what you could do is implement a wrapper struct that implements equality and hashing based on an object's identity. The object itself could have its own value-based equality and hash, for other purposes.

struct IdentityWrapper<T: AnyObject> {
    let object: T

    init(_ object: T) { self.object = object }
}

extension IdentityWrapper: Equatable {
    static func == (lhs: IdentityWrapper, rhs: IdentityWrapper) -> Bool {
        return lhs.object === rhs.object
    }
}

extension IdentityWrapper: Hashable {
    func hash(into hasher: inout Hasher) {
        hasher.combine(ObjectIdentifier(self.object))
    }
}

Using the IdentityWrapper in a set requires you to manually wrap objects before interacting with the set. It's performant (since struct don't need any array allocation), and most likely the struct is entirely inlined anyway, but it can be a little annoying. Optionally, you could implement a struct IdentitySet<T> which just wraps a Set<IdentityWrapper<T>>, which tucks away the wrapping code.

class MyClass: Hashable {
    var number: Int

    init(_ number: Int) {
        self.number = number
    }

    // Value-based equality
    static func == (lhs: MyClass, rhs: MyClass) -> Bool {
        return lhs.number == rhs.number
    }

    // Value-based hashing
    func hash(into hasher: inout Hasher) {
        hasher.combine(self.number)
    }
}

var mySet: Set<IdentityWrapper<MyClass>> = []

let number1 = MyClass(1)
let secondNumber1 = MyClass(1)

number1 == secondNumber1        // true: integer values are equal, so are the wrapping classes
number1 === secondNumber1       // false: two different instances

mySet.insert(IdentityWrapper(number1))

print(mySet.contains(IdentityWrapper(number1))) // true
print(mySet.contains(IdentityWrapper(secondNumber1))) // false
Alexander
  • 59,041
  • 12
  • 98
  • 151
  • Wouldn't the problem with hash collisions be the same? Obviously the hashValue that is generated from the ObjectID is prone to collisions, surprisingly, though. – MassMover Sep 03 '19 at 01:26
  • "Wouldn't the problem with hash collisions be the same?" The hash collisions aren't the problem. It's the false positive in equality matching that follows that causes your bug. "the hashValue that is generated from the ObjectID is prone to collisions, surprisingly," Why is that surprising? – Alexander Sep 03 '19 at 01:39
  • Imagine a hypothetical product. A product's hash is its "category". The equality of a product is value based. I.e. two boxes of pasta, in the same size, brand, etc. are equal, even if they're two seperate objects (instances), with differen identites. When you go to a store and you look for an item, you don't do a blind linear search, checking every product in the store, one by one, until you find a match. Instead, you use a heuristic, the product category, to derive the correct isle. You then do a one-by-one match within the limited set of objects in that isle/area. – Alexander Sep 03 '19 at 01:44
  • Set and dictionary does the same thing. It narrows down the set of objects that need 1-by-1 searching, from all objects, down to only those objects with the same hash. If the hash is different, then there's no point checking them. For example, this is like looking for your desired pasta in the dairy fridge area. No bueno. Once the search pool is narrowed down, the 1-by-1 matching begins. If your equality matching doesn't match your hash function, you can end up with a false positive. – Alexander Sep 03 '19 at 01:45
  • You hash function is identity based. Suppose that all boxes have an globally unique identity (akin to their `ObjectIdentifier`), and that you were looking for a specific box of spaghetti by its ID. Not just any spaghetti, but a single particular box. (suppose also that you had a way to map from object ID to isle number). By using your isle heuristic, you end up in the pasta section. While looking among the boxes of pasta, you use your equality comparison, which is erroneously value based. It says any 2 spaghetti boxes compare as equal. – Alexander Sep 03 '19 at 01:49
  • When you start doing your 1-by-1 comparisons, you're using this equality comparison, and can mistakenly conclude that a box of spaghetti (which isn't your exact box you were looking for) is what you're looking for, because it too is spaghetti, and satisfies your value-based `==`. That's the bug you're experiencing. To fix this situation, your 1-by-1 process needs to be more specific, and check specific object ids, until those match. It's not sufficient for the object to merely also be spaghetti. Its object id needs to match the one you're looking for. – Alexander Sep 03 '19 at 01:51
  • @MassMover Did I answer your question? – Alexander Sep 05 '19 at 15:42
  • Thank you for your thorough explanation, with your help, and some further web research, I now clearly understand how the concept of hash values in a set works - I did not fully understand before the concept of 'buckets', now I understand, that more often than not a query will be checked twice. While rmaddys hint towards NSHashTable seemed promising I did not have the time to check your wrapper solution, will do so. So, while I am still looking for a solution for my problem, as my question was actually: "What is wrong with my code?", you indeed answered it. – MassMover Sep 05 '19 at 22:26
  • Glad to hear it. Let me know if you have any further questions! – Alexander Sep 05 '19 at 22:44
  • Actually, I re-phrased my question in a new thread to better reflect what I am truly looking for: https://stackoverflow.com/questions/57814064/swift-how-can-i-handle-objects-selected-by-the-user As I wrote, I did not look deeper into your wrapper suggestion, but maybe there is a completely different approach to my actual intention. – MassMover Sep 05 '19 at 23:37
  • 1
    And this is the cost of [XY problems](https://en.wikipedia.org/wiki/XY_problem). I put a lot of time into answering a question that wasn't your question. Next time, ask a question about your root problem, not a question about your attempt at a problem. – Alexander Sep 06 '19 at 00:02
  • Actually, this is fine, your root problem is similar enough that this is still applicable. So why didn't you look into my wrapper suggestion? – Alexander Sep 06 '19 at 00:04
  • Because coding is not my profession, I just do that in my spare time and didn't have the time to try it yet. Furthermore, I have no doubt that your solution works on my strapped down code, but in my original project it is probably a bit more complicated to adjust each part where the set is accessed. I understand it that you propose a convenient solution by additionally wrap the wrapper into the IdentitySet struct, but again, I am not a professional, so what to you is a simple exercise for me it takes time to understand and adapt it to my actual code. So please, be patient with me ;) – MassMover Sep 06 '19 at 13:27
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/199085/discussion-between-alexander-and-massmover). – Alexander Sep 06 '19 at 14:34