Time complexity of python code for finding the longest word that can be made of other words in the list

Question

I am preparing for some coding interviews and came up with a solution to the following problem:

"Find longest word that can be made of other words in a list of words".

I have a hard time figuring out the time complexity of my algorithm. It would be great, if you can help me to figure out the time complexity of the following code.

It is more than n² (n⋅log n for sorting, number_of_words ⋅ word_length + recursion), but not sure how to calculate the exact complexity due to recursion part.

def valid_pair(a,b, words):
    return  a in words and b in words

def valid_word(word, words):
    for i in range(len(word)):
        left, right = word[:i], word[i:]
        valid =  valid_pair(left, right, words)
        if valid:
            return valid
        elif left in words:
            return valid_word(right, words)

    return False


words = ["cde","abcde","bcd","c","d","e","a"]

words = sorted(words, key = len, reverse = True)
for w in words:
    if valid_word(w, words):
        print w
        break

@m170897017 Modified the example, it gives "abcde". The word "abcde" is the longest word that can be made out of other words in the list. "abcde" is made out of "a", "bcd" and "e". — CodeApprentice, May 10 '15 at 15:02
@StefanPochmann made of means to concatenate other strings in the list. For example, if the input list is: "crowdsource", "crowd", "source" , the program should output "crowdsource" as it is the longest word that can be made using other strings in the list: "crowd" and "source" (i.e. by concatenating those strings). Kindly see my previous comment too. — CodeApprentice, May 10 '15 at 15:06

score 1 · Accepted Answer · answered May 10 '15 at 17:33

Let n be the number of words in the list and m the length of the longest word.

The for-loop iterates over the words until valid_word returns true. The worst case would be, that non of the words can be concatenated from other words in the list. So this gives you a factor n.

valid_word iterates over all characters in a word and calls valid_pair which has the complexity O(f), where f = f(n,m) is the complexity of the in operator. (I don't know, how it is implemented). If for every character left is in words, but right is not, valid_word is called recursively m times, resulting in this formula:

T(m) = f + Σ_i=1,...m-1 T(m-i) < f + (m-1) ⋅ T(m-1) < m!⋅f

So valid_word is in O(m!⋅f(n,m)) (this can be improved).

The over all complexity is O(n⋅m!⋅f(n,m) + n⋅log(n)). This is an upper bound, so maybe you can improve this, by showing that it is not possible to have an input that forces the algorithm to do all the steps.

But think of an input like this (withe spaces are only for better readability)

words = ['ab ac ad ae','ab ac ad af', ... , 'ab ac ad az',
         'ab ac ad', 'ab ac', 'ab']

Non of these words can be concatenated from the others, but the algorithm has to try many combinations. This examples can be improved and extended.

Great answer! Thank you very much: According to (http://stackoverflow.com/questions/13884177/complexity-of-in-operator-in-python) in operator is o(n) for list. So the algorithm is roughly O(n^2.m! + n.log(n)). Probably this can be done efficiently using a Trie. Need to investigate further. — CodeApprentice, May 10 '15 at 18:11
@Codesourcer If the length of a word in `words` is bounded by a constant (e.g. `m = 10`), then your algorithm runs in `O(n²)`. — AbcAeffchen, May 10 '15 at 19:49

Time complexity of python code for finding the longest word that can be made of other words in the list

1 Answers1

Linked