What's the runtime complexity of this algorithm for dividing string into words?

Question

I am given a input string s ("bedbathandbeyond") and a set of words {"bed", "bath", "beyond", "bat", "hand", "and"}. I need to divide the input string s into a series of words in the dictionary. In this case, the two allowed outputs would be ["bed", "bath", "and", "beyond"] and ["bed", "bat", "hand", "beyond"]. Both outputs are allowed and none is better than the other. My solution is as follows:

def find_breakdown_rec(s, dictionary, memo = {}):
    if len(s) == 0:
        return True, []

    if (s in memo):
        return memo[s]
    for i in range(len(s) + 1):
        word = s[:i]
        if word in dictionary:
            status, words = find_breakdown_rec(s[i:], dictionary, memo)
            if status:
                    memo[s] =  [word] + words
                    return True, memo[s]

    return False, []

If no memoization is used, the runtime is clearly exponential: we have O(n) branching factor, and O(n) as the depth: O(n ** n). With memoization, however, the algorithm seems to me polynomial: the 'find_breakdown_rec' can be called with n different inputs, and it then performs a polynomial amount of work on it. I've been told by a very knowledgeable person that it's still exponential. Could someone explain why? I've been thinking very hard about it and I'm not sure why that's the case.

Not related to the question, but see https://stackoverflow.com/questions/1132941/least-astonishment-and-the-mutable-default-argument regarding the `memo = {}` default argument. — Barmar, May 13 '21 at 01:24

score 1 · Answer 1 · answered May 14 '21 at 06:41

This can be proved by counting claims. Lets check how many options exist for the memoization implementation:

If we take k as the number of words, and w_i is the length of the i-th word (from the left, lets say) then we know that:

w_1+w_2+..+w_k = n as n is the length of the input.And as we took only valid words with positive length, we also know that w_i>=1

So, an equivalent representation of this is:

w_1+w_2+..+w_k = n-k where w_i >= 0

And by counting claims this is equal to S(k,n-k) = (n-1)Choose(k-1)

Now, as we can choose 1<=k<=n the total number of string partition is determined by the above claims and is equal to

sum_{k=1 to n} of (n-1)Choose(k-1) = sum{k=0 to n-1} of (n-1)Choose(k)

which is indeed exponential, 2^(n-1) = O(2^n)

What's the runtime complexity of this algorithm for dividing string into words?

1 Answers1