0

In my table, I have two columns of the Datatype 'Varchar' with the same length. These columns are binary representations of information, and I would like to select the Hamming Distance between them, but I can't find any function to do that. Does anyone have any guidance?

David Hill
  • 13
  • 3
  • Does this answer your question? [Hamming Distance / Similarity searches in a database](https://stackoverflow.com/questions/9606492/hamming-distance-similarity-searches-in-a-database) – SMor Jul 14 '22 at 17:40

1 Answers1

1

Create an auxiliary numbers table with unique sequential integers at least up until the max string length and then you can do

SELECT CASE WHEN LEN(@String1) = LEN(@String2) THEN COUNT(*) END
FROM   dbo.Numbers
WHERE  Number BETWEEN 1 AND LEN(@String1)
       AND SUBSTRING(@String1, Number, 1) <> SUBSTRING(@String2, Number, 1) 

In future product versions the auxiliary numbers table can be swapped out by the GENERATE_SERIES function.

NB: If you are planning on running this calculation over many strings - e.g. to return the row with the closest Hamming Distance you'll likely find performance an issue though...

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Perhaps add ... AND LEN(@String1) = LEN(@String2) Either way +1 – John Cappelletti Jul 14 '22 at 17:49
  • 1
    @JohnCappelletti - the "with the same length" was a given in the question here. I'll add that anyway though to be on the safe side (and replace the `COUNT` with `SUM(1)` so `NULL` is returned for that case rather than zero – Martin Smith Jul 14 '22 at 17:52