1

I need some help with the below code. I need to find the closest word to the entered word in this case to test I set word_0 as 'pikaru' which should return 'pikachu'. The levenshtein function returns us the distance between the two words that we entered. When I run the below code the answer I get is charmander which is way off, any help would be appreciated.

import backend
name_to_stats, id_to_name, names, 
        pokemon_by_typebackend.get_pokemon_stats()
words = names


word_0 = 'pikaru'
def find_closest_word(word_0, words):
    """Finds the closest word in the list to word_0 as measured by the
    Levenshtein distance

    Args:
        word_0: a str
        words: a list of str

    Returns:
        The closest word in words to word_0 as a str.
    """
    # Hint: use the levenshtein_distance() function to help you out here.
    closest_word = words[0]
    #closest_distance = levenshtein_distance(word_0, words[0])

    for i in words:
        distance = levenshtein_distance(word_0, closest_word)
        new_distance = levenshtein_distance(word_0, i)
        if distance < new_distance:
            return i





def levenshtein_distance(s1, s2):
    """Returns the Levenshtein distance between strs s1 and s2

    Args:
        s1: a str
        s2: a str
    """
    # This function has already been implemented for you.
    # Source of the implementation:
    # https://stackoverflow.com/questions/2460177/edit-distance-in-python
    # If you'd like to know more about this algorithm, you can study it in
    # CSCC73 Algorithms. It applies an advanced technique called dynamic
    # programming.
    # For more information:
    # https://en.wikipedia.org/wiki/Levenshtein_distance
    # https://en.wikipedia.org/wiki/Dynamic_programming
    if len(s1) > len(s2):
        s1, s2 = s2, s1

    distances = range(len(s1) + 1)
    for i2, c2 in enumerate(s2):
        distances_ = [i2+1]
        for i1, c1 in enumerate(s1):
            if c1 == c2:
                distances_.append(distances[i1])
            else:
                distances_.append(1 + min((distances[i1], distances[i1 + 1],
                                       distances_[-1])))
        distances = distances_
    return distances[-1]

1 Answers1

1

It looks like the error is in the return statement of your find_closest_word function:

if distance < new_distance:
    return i

The function will not find the closest word, it will actually find the first word in the list that's further away than words[0]. Instead, try looping through words and keeping track of which word is the best you've seen so far. Something like:

best_distance = levenshtein_distance(word_0, words[0])
best_word = words[0]
for w in words:
    d = levenshtein_distance(word_0, w)
    if d < best_distance:
        best_distance = d
        best_word = w

return best_word
johnpaton
  • 715
  • 5
  • 12
  • 1
    In fact, python provides the `min` and `max` functions to compute these sort of results for a given iterable. You might collapse the whole thing into a single expression. – aghast Nov 22 '18 at 16:58
  • Ah nice, like `min(words, key = lambda w: levenshtein_distance(word_0, w) `. Good catch! – johnpaton Nov 22 '18 at 22:11