0

I know that a similar question has already been asked and solved (Time complexity of python code for finding the longest word that can be made of other words in the list), but I have a follow-up question.

I have the following problem,

# Given a list of words, find the longest word made of other words in the list.
# Example: [cat, banana, dog, nana, walk, walker, dogwalker]
# Result: dogwalker

I have the following implementation in Python 3:

def find_longest_word(list_words):
    list_words = sorted(list_words, key=lambda x:len(x))[::-1]
    for elem in list_words:
        set_remaining = set(list_words) - set([elem])
        if find_if_composite(elem, set_remaining, {}):
            return elem
    return None

def find_if_composite(s, set_):
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite(s[i:], set_): # assuming that I can reuse the string
                return True
    return False

We know from the previous answer that this code runs in O(N!) where N is the size of the string (if I am correct).

My question is: is there any way to improve the time complexity (e.g., using memoization)? If not, why? If yes, how? I tried the following code but it seems like it never hits the memo during recursive calls.

def find_if_composite_memo(s, set_, memo):
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in memo:
        return memo[s]
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite_memo(s[i:], set_, memo):
                memo[s] = True
                memo[s[i:]] = True
                return True
    memo[s] = False
    return False

To test, use

b = ["cat", "banana", "dog", "nana", "walk", "walker", "dogwalker", "dogwalkerbanana"]
print(find_longest_word(b))
Community
  • 1
  • 1
agave1987
  • 33
  • 3

1 Answers1

0

You use the memo only when you have a full solution. Since you sort the list in length order, your first full solution is the desired answer. Therefore, you will never use your memo.

I added a little tracing to your memoization routine, and enhanced the testing a little:

def find_if_composite_memo(s, set_, memo):
    print "ENTER", s, '\t', memo
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in memo:
        print "Found", s, memo[s]
        return memo[s]
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite_memo(s[i:], set_, memo):
                memo[s] = True
                memo[s[i:]] = True
                print "Add", memo
                return True
    memo[s] = False
    return False

b = ["cat", "banana", "dog", "nana",
     "walk", "walker",
     "dogcat", "dogwalker",
     "dogwalkerbanana",
     "catwalkerbanana-FAIL"]
print(find_longest_word(b))

output:

ENTER catwalkerbanana-FAIL  {}
ENTER walkerbanana-FAIL     {}
ENTER erbanana-FAIL     {}
ENTER banana-FAIL   {'erbanana-FAIL': False}
ENTER -FAIL     {'erbanana-FAIL': False}
ENTER dogwalkerbanana   {}
ENTER walkerbanana  {}
ENTER erbanana  {}
ENTER banana    {'erbanana': False}
Add {'walkerbanana': True, 'erbanana': False, 'banana': True}
Add {'walkerbanana': True, 'erbanana': False, 'dogwalkerbanana': True, 'banana': True}
dogwalkerbanana

You see the problem? With "catwalkerbanana-FAIL", you have the potential to record "catwalk", "catwalker", and "walkerbanana", but don't take advantage of that. The last would save you some search time.

As for improving the overall algorithm, you might consider SE's code review site.

I expect that you can do significantly better if you use some of the following:

(1) When you fail to find the present top-level word **elem** as a composite, you can remove it from the word list entirely.  Don't keep it in set_remaining: since it's at least as long as any other word in the list, it cannot be a component.
(2) In **find_if_composite**, you go through a lot of pain to determine whether any beginning substring of **s** is in the word_list.  I think you can speed this up: use **s.startswith(word)** on each word in the list.  
(3) If the list has enough composite hits to matter, examine your memoization algorithm carefully.  Figure out what is and is not worth saving, and write the pseudo-code to handle only the time-saving parts.  Remember that, in working from largest to smallest, you're going to short-circuit a lot of cases.  In this test set, though, finding "walkerbanana" the first time seen would save a recursion later.

Also look for good places to use other string operations, such as index(substring search).

Community
  • 1
  • 1
Prune
  • 76,765
  • 14
  • 60
  • 81