-3

`So I had a debate with my teacher on what was the best method to quantify how different two given strings are. She told me to use the Levenshtein distance (https://en.wikipedia.org/wiki/Levenshtein_distance if you don't know what it is).

The thing is, that this algorithm is very complex, wouldn't it be much simpler to do this ?`

def distance(s1,s2): return sum([1 for a,b in zip(s1,s2) if a!=b])

  • 1
    The Levenshtein_distance between "a sentence" and "sentence" is 2. Your distance it's 8, but the difference is only 2 characters. – gre_gor Mar 09 '23 at 10:08
  • 1
    Yours is the Hamming distance and that Wikipedia article already explains the difference. – gre_gor Mar 09 '23 at 10:15
  • Does this answer your question? [Hamming Distance vs. Levenshtein Distance](https://stackoverflow.com/questions/4588541/hamming-distance-vs-levenshtein-distance) – gre_gor Mar 09 '23 at 10:16
  • in your method you are comparing element by element and checking if they are equal. let's suppose you have an offset of 1 character (very common for example in DNA with basepair insertions and deletions). The strings "HUMAN_" and "_HUMAN" in your case will have a distance of 7, and you agree with me that they are actually quite similar. You want to be able to compare similar strings allowing insertions, deletions and mutations, which is achieved using the Levenshtein distance – Sembei Norimaki Mar 09 '23 at 10:38
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Mar 09 '23 at 16:20

1 Answers1

0

your approach is too simple as it can handle only equal length strings. However the Levensthein distance can handle any two strings and allows all changes including insertions which is much more general.

Klops
  • 951
  • 6
  • 18