1

I am given a input string s ("bedbathandbeyond") and a set of words {"bed", "bath", "beyond", "bat", "hand", "and"}. I need to divide the input string s into a series of words in the dictionary. In this case, the two allowed outputs would be ["bed", "bath", "and", "beyond"] and ["bed", "bat", "hand", "beyond"]. Both outputs are allowed and none is better than the other. My solution is as follows:

def find_breakdown_rec(s, dictionary, memo = {}):
    if len(s) == 0:
        return True, []

    if (s in memo):
        return memo[s]
    for i in range(len(s) + 1):
        word = s[:i]
        if word in dictionary:
            status, words = find_breakdown_rec(s[i:], dictionary, memo)
            if status:
                    memo[s] =  [word] + words
                    return True, memo[s]

    return False, []

If no memoization is used, the runtime is clearly exponential: we have O(n) branching factor, and O(n) as the depth: O(n ** n). With memoization, however, the algorithm seems to me polynomial: the 'find_breakdown_rec' can be called with n different inputs, and it then performs a polynomial amount of work on it. I've been told by a very knowledgeable person that it's still exponential. Could someone explain why? I've been thinking very hard about it and I'm not sure why that's the case.

AndresArpi
  • 21
  • 1
  • 5
  • Not related to the question, but see https://stackoverflow.com/questions/1132941/least-astonishment-and-the-mutable-default-argument regarding the `memo = {}` default argument. – Barmar May 13 '21 at 01:24
  • (Also [on CS@SE](https://cs.stackexchange.com/q/140264)) – greybeard May 14 '21 at 17:50

1 Answers1

1

This can be proved by counting claims. Lets check how many options exist for the memoization implementation:

If we take k as the number of words, and w_i is the length of the i-th word (from the left, lets say) then we know that:

w_1+w_2+..+w_k = n as n is the length of the input.And as we took only valid words with positive length, we also know that w_i>=1

So, an equivalent representation of this is:

w_1+w_2+..+w_k = n-k where w_i >= 0

And by counting claims this is equal to S(k,n-k) = (n-1)Choose(k-1)

Now, as we can choose 1<=k<=n the total number of string partition is determined by the above claims and is equal to

sum_{k=1 to n} of (n-1)Choose(k-1) = sum{k=0 to n-1} of (n-1)Choose(k)

which is indeed exponential, 2^(n-1) = O(2^n)

e.ad
  • 380
  • 2
  • 10