4

Given a finite dictionary of words and a start-end pair (e.g. "hands" and "feet" in the example below), find the shortest sequence of words such that any word in the sequence can be formed from either of its neighbors by either 1) inserting one character, 2) deleting one character, or 3) changing one character.

hands -> hand -> and -> end -> fend -> feed -> feet

For those who may be wondering - this is not a homework problem that was assigned to me or a question I was asked in an interview; it is simply a problem that interests me.

I am looking for a one- or two- sentence "top down view" of what approach you would take -- and for the daring, a working implementation in any language.

gcbenison
  • 11,723
  • 4
  • 44
  • 82
  • 1
    This has been asked many times before - see eg. http://stackoverflow.com/questions/2205540/algorithm-to-transform-one-word-to-another-through-valid-words or http://stackoverflow.com/questions/1521958/shortest-path-to-transform-one-word-into-another or http://stackoverflow.com/questions/7729666/interview-qn-string-transformation – BlueRaja - Danny Pflughoeft Mar 19 '12 at 20:29
  • @blueraja None of those examples include the letter insertion or deletion rules. (Though they are quite similar). – gcbenison Mar 19 '12 at 20:45
  • The inclusion of that rule does not change the solution whatsoever. – BlueRaja - Danny Pflughoeft Mar 19 '12 at 21:06
  • @BlueRaja-DannyPflughoeft: I think you do have to change the solution. Take this part, for example: `if length(w1) != length(w2)` `Not possible to convert`. – Reinstate Monica Mar 19 '12 at 23:31
  • @WolframH: You treat it as a graph, where the words are nodes, and each word is connected by an edge if they are within 1 "edit-distance" from each other. Then in both cases you just use a shortest-path algorithm. The only (extremely minor) difference is in creating the edges. – BlueRaja - Danny Pflughoeft Mar 20 '12 at 04:28
  • @BlueRaja-DannyPflughoeft: OK; I thought the creation of the edges is (arguably) the hardest part, and maybe I reacted a bit strongly to the word "whatsoever"... – Reinstate Monica Mar 20 '12 at 18:56

3 Answers3

4

Instead of turning the dictionary into a full graph, use something with a little less structure:

For each word in the dictionary, you get a shortened_word by deleting character number i for each i in len(word). Map the pair (shortened_word, i) to a list of all the words.

This helps looking up all words with one replaced letter (because they must be in the same (shortened_word, i) bin for some i, and words with one more letter (because they must be in some (word, i) bin for some i.

The Python code:

from collections import defaultdict, deque
from itertools import chain

def shortened_words(word):
    for i in range(len(word)):
        yield word[:i] + word[i + 1:], i


def prepare_graph(d):
    g = defaultdict(list)
    for word in d:
        for short in shortened_words(word):
            g[short].append(word)
    return g


def walk_graph(g, d, start, end):
    todo = deque([start])
    seen = {start: None}
    while todo:
        word = todo.popleft()
        if word == end: # end is reachable
            break

        same_length = chain(*(g[short] for short in shortened_words(word)))
        one_longer = chain(*(g[word, i] for i in range(len(word) + 1)))
        one_shorter = (w for w, i in shortened_words(word) if w in d)
        for next_word in chain(same_length, one_longer, one_shorter):
            if next_word not in seen:
                seen[next_word] = word
                todo.append(next_word)
    else: # no break, i.e. not reachable
        return None # not reachable

    path = [end]
    while path[-1] != start:
        path.append(seen[path[-1]])
    return path[::-1]

And the usage:

dictionary = ispell_dict # list of 47158 words

graph = prepare_graph(dictionary)
print(" -> ".join(walk_graph(graph, dictionary, "hands", "feet")))
print(" -> ".join(walk_graph(graph, dictionary, "brain", "game")))

Output:

hands -> bands -> bends -> bents -> beets -> beet -> feet
brain -> drain -> drawn -> dawn -> damn -> dame -> game

A word about speed: building the 'graph helper' is fast (1 second), but hands -> feet takes 14 seconds, and brain --> game takes 7 seconds.

Edit: If you need more speed, you can try using a graph or network library. Or you actually build the full graph (slow) and then find paths much faster. This mostly consists of moving the look-up of edges from the walking function to the graph-building function:

def prepare_graph(d):
    g = defaultdict(list)
    for word in d:
        for short in shortened_words(word):
            g[short].append(word)

    next_words = {}
    for word in d:
        same_length = chain(*(g[short] for short in shortened_words(word)))
        one_longer = chain(*(g[word, i] for i in range(len(word) + 1)))
        one_shorter = (w for w, i in shortened_words(word) if w in d)
        next_words[word] = set(chain(same_length, one_longer, one_shorter))
        next_words[word].remove(word)

    return next_words


def walk_graph(g, start, end):
    todo = deque([start])
    seen = {start: None}
    while todo:
        word = todo.popleft()
        if word == end: # end is reachable
            break

        for next_word in g[word]:
            if next_word not in seen:
                seen[next_word] = word
                todo.append(next_word)
    else: # no break, i.e. not reachable
        return None # not reachable

    path = [end]
    while path[-1] != start:
        path.append(seen[path[-1]])
    return path[::-1]

Usage: Build the graph first (slow, all timings on some i5 laptop, YMMV).

dictionary = ispell_dict # list of 47158 words
graph = prepare_graph(dictionary)  # more than 6 minutes!

Now find the paths (much faster than before, times without printing):

print(" -> ".join(walk_graph(graph, "hands", "feet")))          # 10 ms
print(" -> ".join(walk_graph(graph, "brain", "game")))          #  6 ms
print(" -> ".join(walk_graph(graph, "tampering", "crunchier"))) # 25 ms

Output:

hands -> lands -> lends -> lens -> lees -> fees -> feet
brain -> drain -> drawn -> dawn -> damn -> dame -> game
tampering -> tapering -> capering -> catering -> watering -> wavering -> havering -> hovering -> lovering -> levering -> leering -> peering -> peeping -> seeping -> seeing -> sewing -> swing -> swings -> sings -> sines -> pines -> panes -> paces -> peaces -> peaches -> beaches -> benches -> bunches -> brunches -> crunches -> cruncher -> crunchier
Reinstate Monica
  • 4,568
  • 1
  • 24
  • 35
  • Very cool to see a working python implementation. Is there a way to improve the performance of the graph-walking step? A standard python graph library that could be leveraged? – gcbenison Mar 20 '12 at 04:00
  • @gcbenison: I don't know the graph libraries well enough for a qualified answer. Maybe my update (really computing the graph, i.e. looong setup) has fast enough lookup for you? – Reinstate Monica Mar 20 '12 at 23:12
  • I don't know about Python, but in C there is the [igraph](http://igraph.sourceforge.net/) library. I used it in [this implementation](https://github.com/gbenison/taubatron) which I described [here](http://gcbenison.wordpress.com/2012/03/28/the-nerdy-stuff-matters/). I'm no Python expert but I think the approach to building the graph is pretty similar to yours. – gcbenison Mar 28 '12 at 20:12
  • @gcbenison: After reading your description, yes, it seems to be basically the same. I'm sure that your implementation is much faster... – Reinstate Monica Mar 28 '12 at 20:56
1

A naive approach could be to turn the dictionary into a graph, with the words as nodes and the edges connecting "neighbors" (i.e. words that can be turned into one another via one operation). Then you could use a shortest-path algorithm to find the distance between word A and word B.

The hard part about this approach would be finding a way to efficiently turn the dictionary into a graph.

VeeArr
  • 6,039
  • 3
  • 24
  • 45
1

Quick answer. You can compute for the Levenshtein distance, the "common" edit distance in most dynamic programming texts, and, from the computation table generated, try to build that path.

From the Wikipedia link:

d[i, j] := minimum
               (
                 d[i-1, j] + 1,  // a deletion
                 d[i, j-1] + 1,  // an insertion
                 d[i-1, j-1] + 1 // a substitution
               )

You can take note of when these happens in your code (maybe, in some auxiliary table) and, surely, it'd be easy reconstructing a solution path from there.

skytreader
  • 11,467
  • 7
  • 43
  • 61