3

My grade-5 daughter asked help for her home work. In a 10x10 matrix, find 9 country name. I think computer can do better job on this after trying half hour without all the answers. Here is what I did. But it only find 7 countries. The eighth one is England, which is named different in the country list. Where is the last one? Also this script is very inefficient since it searches the country list multiple times with duplicate elements. I found this boggle solver post, which is handling more sophisticate case. It mentioned "trie" data structure can solve this kind of problem efficiently. Can anyone elaborate the details? Thanks in advance,

#!/usr/bin/env python
import numpy as np
import re
import os

B = [['k','l','m','a','l','t','a','l','b','s'],
     ['i','e','n','y','e','j','i','i','y','r'],
     ['o','r','o','h','w','d','r','z','u','i'],
     ['c','o','r','v','m','z','t','a','i','l'],
     ['i','p','w','b','j','q','s','r','d','a'],
     ['x','a','a','d','n','c','u','b','f','n'],
     ['e','g','y','e','h','i','a','h','w','k'],
     ['m','n','g','a','k','g','f','d','s','a'],
     ['g','i','d','n','a','l','g','n','e','y'],
     ['b','s','t','r','f','g','s','a','i','u']]
Matrix=np.matrix(B)
Shape=Matrix.shape
row = Shape[0]-1
col = Shape[1]-1
wordlist = []
DIRECTIONS = [ (-1,-1), (0,-1), (1,-1),(-1,0),(1,0),(-1,1),(0,1),(1,1)]

def expand(i,j,xd,yd):
#every time expand to one direction based on passed in variable
    if ( i+xd >= 0 and i+xd <= row and j+yd >= 0 and j+yd <= col ):
        wordlist.append(wordlist[-1] + Matrix[i+xd,j+yd])
        expand(i+xd,j+yd,xd,yd)

print 'matrix is {} x{:2}'.format(Shape[0],Shape[1])
for i in range(Shape[0]):
    for j in range(Shape[1]):
        for xd,yd in DIRECTIONS:
#before extend to each direction, should set last element as current position letter
# country name is from http://www.countries-list.info/Download-List
            wordlist.append(Matrix[i,j])
            expand(i,j,xd,yd)

for word in wordlist:
# tried to regex to search file, but it is slow comparing to system grep command
        if len(word) > 1 and  ( not os.system("grep -iw " + word + " /home/cmaaek/Downloads/list.txt > /dev/null")):
            print(word)
roadtest ma
  • 103
  • 7

2 Answers2

1

A trie (pronounced "try") is a data structure that looks like this:

Figure 1. Diagram of a Trie

Each path in a trie corresponds to a distinct word. In the trie above, observe that you can trace along any path to form a valid word. This is constructed by having two nodes share a parent node if they share a letter at a specific position. So, for "NEWS" and "NOT", the characters in the first position of each word are the same - this is why they share the node "N" in the diagram. However, the second characters of each word are "E" and "O" respectively, which are not the same, so they branch off into separate paths. Each node also has an indicator which tells you if it corresponds to the end of a word (this is not shown in the diagram).

The reason this is efficient is because it is a very compact/efficient way to represent a dictionary. It also allows us to quickly determine if a query word is in our dictionary, since we will only need to go down exactly one path every time to determine if a word is in our dictionary.

You can build a trie from the list of country names, then, you can query the board for country names using the trie. For example, if you were looking for "Video" in the trie diagram that I gave above, then you would return False because none of the starting nodes (A, D, N, Z) match the first character V. If the first character matches, then you check if any of the child nodes match the next character, and so on. This allows you to quickly eliminate options when you are searching from some starting character in your crossword board.

As another optimization, you may want to record crossword board positions that led to dead ends (i.e. no way to form a valid country name from that position). This will allow you to avoid looking over board positions that don't help you find solutions, which should also speed up your code. Hope this helps!

hologram
  • 533
  • 2
  • 5
  • 21
1

Here a trie python implementation, which is slower than string brute force on a so little grid :

B = [['k','l','m','a','l','t','a','l','b','s'],
     ['i','e','n','y','e','j','i','i','y','r'],
     ['o','r','o','h','w','d','r','z','u','i'],
     ['c','o','r','v','m','z','t','a','i','l'],
     ['i','p','w','b','j','q','s','r','d','a'],
     ['x','a','a','d','n','c','u','b','f','n'],
     ['e','g','y','e','h','i','a','h','w','k'],
     ['m','n','g','a','k','g','f','d','s','a'],
     ['g','i','d','n','a','l','g','n','e','y'],
     ['b','s','t','r','f','g','s','a','i','u']]

The ways :

way={}
A=way[1,0]=sum(B,[])
way[0,1]=sum((A[i::10] for i in range(10)),[])
way[1,1]=(A*11)[::11]
way[-1,1]=(A*9)[::9]
ways='|'.join([''.join(l) for  l in way.values()])    
ways+= '|'+ ways[::-1]
#countries=['afghanistan',....,'england;), ...

The brute force :

# In [26]: [w for w in countries if w in ways]
# Out[26]: ['austria', 'brazil', 'chad', 'england', 'malta',\
# 'mexico', 'norway', 'singapore']
# 
# In [27]: %timeit [w for w in countries if w in ways]
# 238 µs ± 7.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 

The trie:

from collections import defaultdict
trie = lambda : defaultdict(trie) 

def add(country,root):
    t=root
    for l in country: t=t[l]
    t[''] = country

countrie = trie()
for c in countries : add(c,countrie)  

def find(trie,way):
    for i in range(len(way)):
        t=trie
        for l in way[i:]:
            if l in t:
                t=t[l]
                if '' in t : yield t['']
            else: break


# In [28]: list(find(countrie,ways))
# Out[28]: 
# ['malta', 'norway', 'chad', 'brazil', 'austria',\
#  'singapor, 'mexico', 'england']
# 
# In [29]: %timeit list(find(countrie,ways))
# 457 µs ± 9.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)            

edit

with a 370k english words dictionary from here, brute force takes 412 ms to find 496 words. Trie technic is 500x faster , it takes just 900 µs, even if the trie creation cost is 600ms; but you have to build it just one time.

B. M.
  • 18,243
  • 2
  • 35
  • 54
  • The first few lines of composing "ways" are quite obscure to me. Could you please explain why you consider that is the better way? Thanks! – roadtest ma Nov 14 '17 at 02:26