4

Given a Person struct that has a name and surname string properties I would like to write an hashing algorithm that is efficient and avoids collisions for persons with interchangeable names and surnames (Lara Ray and Ray Lara for example.).

I already know to stir away from string concatenation in Swift, so Ideally I am thinking at XORing the 2 variables and bit shifting one of them to solve the interchangeable problem. Is there anything wrong with this?

struct Person {
    let name: String
    let surname: String

    var hashValue: Int {
        return surname.hashValue << 1 ^ name.hashValue
    }
}
cescofry
  • 3,706
  • 3
  • 26
  • 22

1 Answers1

1

Martin R graciously provided a Swift translation of Boost's hash_combine function in an old Code Review post of mine here.

We can use this function in your struct:

func hash_combine(seed: inout UInt, value: UInt) {
    let tmp = value &+ 0x9e3779b9 &+ (seed << 6) &+ (seed >> 2)
    seed ^= tmp
}

struct Person {
    let name: String
    let surname: String

    var hashValue: Int {
        var seed = UInt(0)
        hash_combine(seed: &seed, value: UInt(bitPattern: name.hashValue))
        hash_combine(seed: &seed, value: UInt(bitPattern: surname.hashValue))
        return Int(bitPattern: seed)
    }
}

Person(name: "Joe", surname: "Smith").hashValue // -5143836311621647467
Person(name: "Smith", surname: "Joe").hashValue // -5146825509308597586

While not perfect, this should reduce the number of collisions in a huge sample set (see the linked post of an example with CGPoint).

You can read more about the "golden ratio" here: Magic number in boost::hash_combine

I'm still testing this, but I imagine this would provide fewer collisions than just bit shifting by 1.

JAL
  • 41,701
  • 23
  • 172
  • 300