3

I’m trying to compare clients, two by two, whose qualities can be defined by binary choices (for example a client uses a product or not).
After much search online, it looks like I’d need to use the Hamming Distance for that, or its equivalent: find the Hamming Weight to the result of an XOR operation between two words.

For a concrete example, Hamming distance between 1001 & 1011:

Calculate the number 1001 XOR 1011= 0010
Hamming weight of 0010 = 1 (numbers of bit set to 1 in 0010)

I need to do that in for words up to 96 bits.

I found some information on

http://people.revoledu.com/kardi/tutorial/Similarity/HammingDistance.html

http://trustedsignal.blogspot.ca/2015/06/xord-play-normalized-hamming-distance.html

and plenty of pieces of code , for example

Hamming weight written only in binary operations?

but only in C, Java , Perl, O, opencl... anything but Excel VBA.

So far, here’s what I manage to put together.

It works, but unfortunately only for words of 30 bits or less, and uses a somewhat crude method: XOR on the two numbers X and Y, then convert to a string that represent the binary number. Then count the length of the string once the 1’s are taken out. I guess there’s a more elegant and efficient way.

Public Function HamDist(x As Long, y As Long, NbBit As Integer)

Dim i As Long, BinStrg As String, bxor As Long 

bxor = x Xor y 

BinStrg = "" 

For i = NbBit To 0 Step -1 ‘going from left to right 
         If bxor And (2 ^ i) Then
            BinStrg = BinStrg + "1" ‘add a 1 to the string 
         Else
            BinStrg = BinStrg + "0"
         End If
      Next

 HamDist = Len(BinStrg) - Len(Replace(BinStrg, "1", "")) ' replace the 1 by nothing and count  the length of the resulting string 
End Function

Can you help make it works for 96 bit words in VBA for Excel 2010 and below (udf or sub), either by calculating the hamming weight or distance ?

Community
  • 1
  • 1
P. O.
  • 185
  • 1
  • 14
  • 1
    Your `x` and `y` can't be `Long`. For 96 bits, you'll have to work with strings as inputs rather than `Integer`, `Long`, `LongLong`, or `Double`, since these all store 64 or fewer bits. This is unwieldy and requires you to write large number arithmetic functions [like here](http://www.tushar-mehta.com/misc_tutorials/project_euler/LargeNumberArithmetic.htm). – Alexis Olson Aug 05 '16 at 06:08

1 Answers1

2

If you store the chain of qualities in String form (e.g. a String consisting only of the letters 'T' and 'F'), this could quite easily be done using a loop.

Function hammingDistance(qualities1 As String, qualities2 As String) As Integer

    If Len(qualities1) <> Len(qualities2) Then
        hammingDistance = -1
        Exit Function
    End If

    Dim i, result As Integer
    result = 0

    For i = 1 To Len(qualities1)
        If Mid(qualities1, i, 1) <> Mid(qualities2, i, 1) Then result = result + 1
    Next

    hammingDistance = result

End Function
Maciej Lipinski
  • 185
  • 3
  • 10