1

I need to make a custom struct conform to Hashable so that I can use it as a Dictionary key type. The challenge though, is that two of the struct's properties are interchangeable for the purpose of identifying a unique instance.

Here's a simplified example to illustrate the problem:

struct MultiplicationQuestion {
    let leftOperand: Int
    let rightOperand: Int
    var answer: Int { return leftOperand * rightOperand }
}

The two important properties for identifying a unique MultiplicationQuestion are leftOperand and rightOperand, but it doesn't matter what order they are in, because '1 x 2' is essentially the same question as '2 x 1'. (For reasons I won't go into here, they need to be kept as separate properties.)

I tried defining Hashable conformance as follows, knowing that there is a conflict between equality as I've defined it for == and what the built-in Hasher is going to do:

extension MultiplicationQuestion: Hashable {
    static func == (lhs: MultiplicationQuestion, rhs: MultiplicationQuestion) -> Bool {
        return (lhs.leftOperand == rhs.leftOperand && lhs.rightOperand == rhs.rightOperand) || (lhs.leftOperand == rhs.rightOperand && lhs.rightOperand == rhs.leftOperand)
    }
    func hash(into hasher: inout Hasher) {
        hasher.combine(leftOperand)
        hasher.combine(rightOperand)
    }
}

I tested this by creating two Sets of questions and performing various operations on them:

var oneTimesTables = Set<MultiplicationQuestion>()
var twoTimesTables = Set<MultiplicationQuestion>()
for i in 1...5 {
    oneTimesTables.insert( MultiplicationQuestion(leftOperand: 1, rightOperand: i) )
    twoTimesTables.insert( MultiplicationQuestion(leftOperand: 2, rightOperand: i) )
}

let commonQuestions = oneTimesTables.intersection(twoTimesTables)
let allQuestions = oneTimesTables.union(twoTimesTables)

The hoped-for result (wishful thinking) is that commonQuestions contains one question (1 x 2), while allQuestions contains nine questions, having removed the duplicate.

The actual result however is unpredictable. If I run the playground multiple times, I get different results. Most of the time, commonQuestions.count is 0, but sometimes it is 1. And most of the time, allQuestions.count is 10, but sometimes it is 9. (I'm not sure what I was expecting, but this inconsistency was certainly a surprise!)

How can I make the hash(into:) method generate the same hash for two instances where the properties are the same but reversed?

Kal
  • 2,098
  • 24
  • 23

2 Answers2

2

This is how Hasher works

https://developer.apple.com/documentation/swift/hasher

However, the underlying hash algorithm is designed to exhibit avalanche effects: slight changes to the seed or the input byte sequence will typically produce drastic changes in the generated hash value.

So the problem here in hash(into:) func

Since the sequence matters combine operation is not commutative. You should find some other function to be a hash for this struct. In your case the best option is

    func hash(into hasher: inout Hasher) {
        hasher.combine(leftOperand & rightOperand)
    }

As @Martin R pointed out to have less collisions it's better to use ^

    func hash(into hasher: inout Hasher) {
        hasher.combine(leftOperand ^ rightOperand)
    }
Tiran Ut
  • 942
  • 6
  • 9
  • Thanks @Tiran. Unfortunately `answer` won't consistently identify a unique question… 2 x 5 and 1 x 10 will both give the same answer, but they are different questions. `leftOperand & rightOperand` might be more promising though! Binary operations do my head in, but I'll have a look. (While waiting for an answer, I also wondered why I didn't try combining the operands in a Set or a sorted Array. Was about to test that idea as well.) – Kal Jul 09 '19 at 05:16
  • 2
    Or `hasher.combine(leftOperand ^ rightOperand)` for slightly less collisions. – Martin R Jul 09 '19 at 05:17
  • @Kal: The hash value does not need to be (and in general, cannot be) unique. – Martin R Jul 09 '19 at 05:18
  • Also, thanks for the optimisation idea of avoiding computed answer property. My actual app does exactly that, but when I was writing the example, I just went for brevity. :-) – Kal Jul 09 '19 at 05:19
  • @MartinR thanks for that! Really, the hash value doesn't need to be unique? Okay, I thought I was just starting to get my head around this Hashable business, but evidently not! :-| – Kal Jul 09 '19 at 05:22
  • @Tiran: *“... combine operation is not associative.”* – Do you mean “commutative”? – Martin R Jul 09 '19 at 05:51
  • @MartinR @Tiran: I tested both `leftOperand & rightOperand` and `leftOperand ^ rightOperand`… No collisions after 10,000 repetitions. Slight performance advantage goes to ^. My idea of feeding it `Set([leftOperand, rightOperand])` also worked, and surprisingly, was just as fast as &. Logically it seems to me that this would never create unwanted collisions (aside from any flaws with Hasher). What do you think? – Kal Jul 09 '19 at 06:14
  • 1
    Suppose that you have `f` which is some hash function. that means that `f(a) != f(b) => a != b`. But `f(a) == f(b)` doesn't mean that `a == b`. The moment when `f(a) == f(b)` and `a != b` is called a collision. Number of collisions for a function should not affect the result of comparison between objects, it will only affect the speed of finding difference between them. So collision doesn't mean bad comparison. In your case any of suggested functions will work properly, but `^` will be slightly faster because of less collisions. – Tiran Ut Jul 09 '19 at 06:50
  • @Tiran: Okay, so I assumed that if the Hasher returned the same value for different objects (a collision), that would result in a comparison error. So what is the purpose of the hash value then—is it just a kind of checksum to make comparisons faster? And then, if a match is found, what… it gets double-checked by your `==` implementation? – Kal Jul 09 '19 at 06:56
  • 1
    I believe it happens this way ) In your example you can replace hash with just `hasher.combine(2)` and you'll see that results didn't change, the difference will be in performance – Tiran Ut Jul 09 '19 at 07:09
  • Ah, you're absolutely right… Slow performance but 100% correct when you feed the Hasher rubbish! I think I finally get it. I guess the inconsistent behaviour I saw originally was due to collisions being referred to `==`. It's strange that so many collisions occurred under those very simple conditions. – Kal Jul 09 '19 at 07:29
  • @TiranUt, many thanks for taking the time to help me understand. I can mark your answer as correct if you can edit it to highlight the second part of your answer which is correct. `hasher.combine(answer)` isn't really correct (although I suppose it would work—just a little slower!) ;-) – Kal Jul 09 '19 at 07:34
  • PS: [This question](https://stackoverflow.com/questions/31665802/how-does-dictionary-use-the-equatable-protocol-in-swift) and [answer](https://stackoverflow.com/a/41046479/2652785) shows that the relationship between hash values and `==` is quite a bit more complex than perhaps either of us imagined it to be. I found the answer interesting, albeit over my head. :-) – Kal Jul 10 '19 at 02:32
0

Tiran Ut's answer (and comments) helped me a lot, and I've marked it as correct. Nonetheless, I thought it would be worth adding another answer to share some things I've learnt and to present another way of solving the problem.

Apple's hash(into:) documentation says:

The components used for hashing must be the same as the components compared in your type’s == operator implementation.

That's all well and good if its a simple one-to-one comparison of properties (like all the code examples show!) but what if your == method has conditional logic like mine? How do you translate that into a value (or values) to feed the hasher?

I was getting hung up on this detail, until Tiran suggested that feeding the hasher a constant value (like 2) would still work, since hash collisions are resolved by == anyway. You wouldn't do this in production of course, because you'd lose all the performance benefits of hash lookups, but the take-home for me was that if you can't make your hasher arguments exactly the same as your == operands, make the hasher equality logic more inclusive (not less).

The solutions in Tiran Ut's answer work because the bitwise operations don't care what order the operands are in, just like my == logic. Occasionally, two completely different pairs may generate the same value (resulting in a guaranteed hash collision), but the only real consequence in these cases is a small hit to performance.

In the end though, I realised that I could use the exact same logic in both cases, thereby avoiding hash collisions—well, aside from any caused by an imperfect hasher algorithm anyway. I added a new private constant to MultiplicationQuestion and initialized it as follows:

uniqueOperands = Set([leftOperand, rightOperand])

A sorted array would have worked too, but a Set seemed like the more elegant choice. Since a Set has no ordering, my verbose conditional logic for == (using && and ||) is already neatly encapsulated in the Set type.

Now, I can just use the very same value to test for equality and to feed the hasher:

static func ==(lhs: MultiplicationQuestion, rhs: MultiplicationQuestion) -> Bool {
    return lhs.uniqueOperands == rhs.uniqueOperands
}

func hash(into hasher: inout Hasher) {
    hasher.combine(uniqueOperands)
}

I've tested performance and it's on par with the bitwise operations. Not only that, but my code has become more concise and readable in the process. Seems like a win-win.

Kal
  • 2,098
  • 24
  • 23