2

I'm calculating the levenshtein distance for some strings. Only the one with a distance of 1 I want to analyze further. What I'm interested in, above all, is the position of the character that makes the distance. So for example;

('rodange', 'redange', 1)  # position 2

I can think of a couple of ways to get there but they seem not very handy (like looping through all characters and comparing them one by one). Something out there already?

LarsVegas
  • 6,522
  • 10
  • 43
  • 67
  • isn't the cost of replace is 2 for levenshtein distance? – Roman Pekar Aug 01 '13 at 12:43
  • I don't really understand your question. What do you mean with replace? And which cost? – LarsVegas Aug 01 '13 at 12:46
  • as far as I remember, levenshtein distance is a sum of inserts deletes and replaces of characters, so you could change word1 into word2 in N steps. insert and delete costs 1 and update = insert + delete cost 2. So in your case difference betweed rodange and redange is not 1, but 2 – Roman Pekar Aug 01 '13 at 12:51
  • take a look here - http://www.stanford.edu/class/cs124/lec/med.pdf. I could try to find solution for you, but it would be nice to know that I'll calucalate difference you need – Roman Pekar Aug 01 '13 at 12:53
  • No, it's the absolute distance, Have look [here](http://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison) – LarsVegas Aug 01 '13 at 12:57

1 Answers1

1

I don't think there's a better solution than what you already figured out. Either add code that returns the index of the first change to the levenshtein algorithm you are using. That should be a single line at the right place, and a modified return statement.

Or loop through it like you say, not too difficult either:

idx = next(i for (i, (a, b)) in enumerate(zip(w1, w2)) if a != b)

If you prefer it shorter:

from operator import eq
idx = map(eq, w1, w2).index(False)
w-m
  • 10,772
  • 1
  • 42
  • 49