1

I have the following function that is part of a crossword solver:

def CrosswordPossibleWords(p_words, p_cw_words):
    """For each word found in the crossword, find the possible words and keep track of the one with the minimum possible words.

    Keyword arguments:
    p_words    -- The dictionary words.
    p_cw_words -- The crossword word attributes.
    """
    l_min = 999999999
    l_min_index = -1
    l_index = 0
    l_choices = []
    for l_cw_word in p_cw_words:
        if l_cw_word[2] >= l_min_length and '-' in l_cw_word[4]:
            pattern = re.compile('^' + l_cw_word[4].replace('.', '%').replace('-', '.').upper() + '$', re.UNICODE)
            l_choice = []
            for l_word in [w for w in p_words if len(w) == len(l_cw_word[4])]:
                if re.match(pattern, l_word):
                    l_choice.append(l_word)
            l_choices.append(l_choice)
            if len(l_choice) < l_min:
                l_min_index = l_index
                l_min = len(l_choice)
        else:
            l_choices.append([])
        l_index = l_index + 1
    return (l_choices, l_min_index)

The crossword words are of the form:

[row, col, length, direction, word]

I have a '.' in a word if I can't solve that word and a '-' if I don't know that letter.

How can I make this code faster? It currently takes about 2.5 seconds to run. Was thinking of using numpy strings; since apparently numpy is 10 times faster, but I don't know anything about numpy and don't know whether I would be able to use all the current string functions with it.

Any ideas?

Superdooperhero
  • 7,584
  • 19
  • 83
  • 138
  • 1
    numpy is 10 times faster at doing what, exactly? Find out where/how your program is spending its time, and optimize that. – Scott Hunter Feb 26 '13 at 21:54

2 Answers2

1

You could partition your dictionary by word-length BEFORE calling this function, so it doesn't have to re-do it with every call.

Scott Hunter
  • 48,888
  • 12
  • 60
  • 101
1

Whilst I agree with Scott Hunter, you are probably looking for something like this where the lists are substituted with dicts:

def CrosswordPossibleWords(p_words, p_cw_words):
    """For each word found in the crossword, find the possible words and keep track of the one with the minimum possible words.

    Keyword arguments:
    p_words    -- The dictionary words.
    p_cw_words -- The crossword word attributes.
    """
    l_min = 999999999
    l_min_index = -1
    l_index = 0
    l_choices = {}    # using dict instead of list
    for l_cw_word in p_cw_words:
        if l_cw_word[2] >= l_min_length and '-' in l_cw_word[4]:
            pattern = re.compile('^' + l_cw_word[4].replace('.', '%').replace('-', '.').upper() + '$', re.UNICODE)
                l_choice = {}  # using dict instead of list

            for l_word in [w for w in p_words if len(w) == len(l_cw_word[4])]:
                if re.match(pattern, l_word):

                    l_choice[l_word]=None

            l_choices[l_choice]=None

            if len(l_choice) < l_min:  ##
                l_min_index = l_index  ## Get rid of this.
                l_min = len(l_choice)  ##
        else:
            l_choices.append([])    # why append empty list?
        l_index = l_index + 1
        l_choices=list(l_choices.keys())   # ...you probably need the list again...
    return (l_choices, l_min_index)
root-11
  • 1,727
  • 1
  • 19
  • 33
  • Are you saying that dicts are faster than lists? – Superdooperhero Feb 26 '13 at 22:05
  • quite a bit :-) in the post: http://stackoverflow.com/questions/513882/python-list-vs-dict-for-look-up-table the computational complexity is explained, where searching through lists grow linearly with the lists' length. Dicts, in contrast, have constant lookup time. – root-11 Feb 26 '13 at 22:08
  • PS. You will get some error message for the changes I made in your text, but it should get you going. – root-11 Feb 26 '13 at 22:13
  • You can use a dict as a key to another dict? – Superdooperhero Feb 26 '13 at 22:15
  • Would sets not have the same speed improvement and be easier to use since it does not require the association? – Superdooperhero Feb 26 '13 at 22:27
  • For as much as I know sets and dicts are both implemented using hashfunctions, so you will probably get very similar speed between sets and dict, when using the "in" function. However as you add values over and over again it might add some overhead that when your hashfunction recasts the set, whilst the dict recycles what it has. It is beyond my knowledge. sorry. – root-11 Feb 26 '13 at 22:39
  • http://stackoverflow.com/questions/10529863/should-i-use-dict-or-list?lq=1 – root-11 Feb 26 '13 at 23:11
  • Sadly it turns out that in this case the dict is actually slower than the list. 2.7 seconds as opposed to 2.3 seconds. – Superdooperhero Feb 27 '13 at 20:54
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/25250/discussion-between-bhm-and-superdooperhero) – root-11 Feb 27 '13 at 20:55