15

This is another question I was asked in the telephonic interview:

Given a dictionary and a crossword(2d matrix of characters) find all the dictionary words which can be found in the crossword.

All I could think of was hash the dictionary, find every possible word in the crossword and search the hash. I could not optimize it all.

Must admit Microsoft interview questions are tough :(

Please give me the lines to think on.

pnuts
  • 58,317
  • 11
  • 87
  • 139
Edward
  • 927
  • 2
  • 9
  • 12

6 Answers6

4

How about:

  • build a search tree for the dictionary (one level per letter)
  • for each position in the grid, start searching through the index, one character at a time, and proceed in each allowed direction until there are no entries left in the index, or you hit a grid boundary

I don't think a hash would be a very useful optimization here.

tdammers
  • 20,353
  • 1
  • 39
  • 56
  • but the question asks you to use a dictionary. +1 for the creativity – mustafabar Nov 30 '10 at 13:06
  • 2
    Yes, and he did use a dictionary. To create a search tree from it :) +1 – Goran Jovic Nov 30 '10 at 13:12
  • Define search tree? What kind? – moinudin Nov 30 '10 at 13:30
  • @marcog: The kind where at each nesting level you have one branch per distinct letter. So if your dictionary contains the entries "foo", "bar", and "baz", you'de have "f" and "b" at the top level. The "f" branch would contain one branch, "o", which in turn would contain one leaf, the second "o". The "b" branch would contain one branch, "a", with two leaves, "r" and "z". – tdammers Nov 30 '10 at 13:58
  • @mustafabar: Since a key-value collection doesn't fit the actual situation, I think it is safe to assume that by "dictionary", they actually mean "unordered collection of strings". – tdammers Nov 30 '10 at 14:01
  • I think you're referring to a trie not tree – Saad Oct 14 '12 at 02:12
  • @Saad: a trie *is* a form of tree. And while I did, in fact, have a trie in mind, other tree-structures might also work. – tdammers Oct 14 '12 at 10:46
4

The most appropriate solution depends a lot on the constraints you expect to be dealing with. How large is your dictionary? How large is your crossword.

I'd suggest taking a look at Suffix trees. You can insert all the dictionary words into one. Then search the suffix tree for the rows, columns and diagonals. For the rows, start a search from the root of the tree for the first letter in each row and iterate through the tree as you pass through the row. Do the same from right to left if necessary. Similar story for columns and diagonals.

Tree construction is O(N) and consumes O(N) space, where N is the size of your dictionary in characters. Searching will then take O(PQ) time, where your crossword is of size PxQ. Giving an overall runtime of O(N+PQ) and space of O(N).

The thing is, though, suffix trees are a pain to implement. They really are. So you might prefer settling for a simple Trie, which will give you a total runtime of O(N+PQ(max(P,Q)).

moinudin
  • 134,091
  • 45
  • 190
  • 216
  • Can you please tell me how you derived the aysmtotic runtimes? – Edward Nov 30 '10 at 13:19
  • Check the wiki articles, they explain the runtimes for the trees. For suffix tree searching, for each row/column/diagonal you perform one query per character since it dynamically wraps around to a new partial word very efficiently - which is 3PQ, hence O(PQ). For tries, you have to start searching the trie at every character for each of the 6 directions. Since there are PQ characters, and a word could be at worst max(P,Q) in length you have 6PQ(max(P,Q)) hence PQ(max(P,Q)). – moinudin Nov 30 '10 at 13:26
4

This question is exactly how one would play Boggle.

This past question in SO more than sufficiently ANSWERS THIS QUESTION.

Have fun...

Community
  • 1
  • 1
Srikar Appalaraju
  • 71,928
  • 54
  • 216
  • 264
2

Suppose the dictionary contains n words of average length k, and the matrix contains m² characters.

  1. Preprocess the dictionary into a prefix tree (aka trie). — O(kn)
  2. For each position in the matrix, look up the strings across and down in the trie. — O(m³)

Total time: O(max(kn, m³))

In realistic word-searches, the average length of words found in the matrix is more like k than like m, so the time taken would be O(k max(n, m²)).

Gareth Rees
  • 64,967
  • 9
  • 133
  • 163
  • Suffix trees yield faster lookups. See my answer. – moinudin Nov 30 '10 at 13:13
  • 1
    @marcog, since in many cases we're looking up a word one letter longer than the one we looked up last time, the lookup time is probably better than 'average' for prefix tries. I'm not sure what its asymptopic behaviour is, though. – The Archetypal Paul Nov 30 '10 at 13:22
  • Can you give a bit more detail of how the search works? I can't quite see how to turn the crossword search into one of the efficient suffix tree operations. – Gareth Rees Nov 30 '10 at 13:25
  • @Paul Interview questions *usually* care about worst case analysis as much as average case. Suffix trees will never be worse, and in many cases better than tries. – moinudin Nov 30 '10 at 13:30
  • If the anonymous downvoter would explain what I got wrong in this answer, I would be happy to attempt to correct it. – Gareth Rees Nov 30 '10 at 13:44
2

What was wrong with your answer?

  • A dictionary is sorted so I think I'd arrange the dictionary words into a prefix trie. This would help since there are probably lots of words where the prefix is also a word. The sorting helps (minimally) with the build time.

  • Then walk the crossword looking at all possible words. As you extract the characters of a potential word, you are walking down the trie - so you'll find the first word starting with a certain set of characters, but also be in the right place to continue to find other words starting with the same characters

The Archetypal Paul
  • 41,321
  • 20
  • 104
  • 134
0

I would compile the dictionary into a DFA that recognizes words in the dictionary, then run it over the rows and columns and diagonals of the letter matrix. Should be O(m+n) where m is the length of the dictionary in characters and n is the area (w*h) of the matrix.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711