Richard's answer will work well in many cases, but it can take exponential time: this will happen if there are many segments of the string W, each of which can be decomposed in multiple different ways. For example, suppose W is abcabcabcd
, and the other words are ab
, c
, a
and bc
. Then the first 3 letters of W can be decomposed either as ab|c
or as a|bc
... and so can the next 3 letters, and the next 3, for 2^3 = 8 possible decompositions of the first 9 letters overall:
a|bc|a|bc|a|bc
a|bc|a|bc|ab|c
a|bc|ab|c|a|bc
a|bc|ab|c|ab|c
ab|c|a|bc|a|bc
ab|c|a|bc|ab|c
ab|c|ab|c|a|bc
ab|c|ab|c|ab|c
All of these partial decompositions necessarily fail in the end, since there is no word in the input that contains W's final letter d
-- but his algorithm will explore them all before discovering this. In general, a word consisting of n copies of abc
followed by a single d
will take O(n*2^n) time.
We can improve this to O(n^2) worst-case time (at the cost of O(n) space) by recording extra information about the decomposability of suffixes of W as we go along -- that is, suffixes of W that we have already discovered we can or cannot match to word sequences. This type of algorithm is called dynamic programming.
The condition we need for some word W to be decomposable is exactly that W begins with some word X from the set of other words, and the suffix of W beginning at position |X|+1 is decomposable. (I'm using 1-based indices here, and I'll denote a substring of a string S beginning at position i and ending at position j by S[i..j].)
Whenever we discover that the suffix of the current word W beginning at some position i is or is not decomposable, we can record this fact and make use of it later to save time. For example, after testing the first 4 decompositions in the 8 listed earlier, we know that the suffix of W beginning at position 4 (i.e., abcabcd
) is not decomposable. Then when we try the 5th decomposition, i.e., the first one starting with ab
, we first ask the question: Is the rest of W, i.e. the suffix of W beginning at position 3, decomposable? We don't know yet, so we try adding c
to get ab|c
, and then we ask: Is the rest of W, i.e. the suffix of W beginning at position 4, decomposable? And we find that it has already been found not to be -- so we can immediately conclude that no decomposition of W beginning with ab|c
is possible either, instead of having to grind through all 4 possibilities.
Assuming for the moment that the current word W is fixed, what we want to build is a function f(i) that determines whether the suffix of W beginning at position i is decomposable. Pseudo-code for this could look like:
- Build a trie the same way as Richard's solution does.
- Initialise the array KnownDecomposable[] to |W| DUNNO values.
f(i):
- If i == |W|+1 then return 1. (The empty suffix means we're finished.)
- If KnownDecomposable[i] is TRUE or FALSE, then immediately return it.
- MAIN BODY BEGINS HERE
- Walk through Richard's trie from the root, following characters in the
suffix W[i..|W|]. Whenever we find a trie node at some depth j that
marks the end of a word in the set:
- Call f(i+j) to determine whether the rest of W can be decomposed.
- If it can (i.e. if f(i+j) == 1):
- Set KnownDecomposable[i] = TRUE.
- Return TRUE.
- If we make it to this point, then we have considered all other
words that form a prefix of W[i..|W|], and found that none of
them yield a suffix that can be decomposed.
- Set KnownDecomposable[i] = FALSE.
- Return FALSE.
Calling f(1) then tells us whether W is decomposable.
By the time a call to f(i) returns, KnownDecomposable[i] has been set to a non-DUNNO value (TRUE or FALSE). The main body of the function is only run if KnownDecomposable[i] is DUNNO. Together these facts imply that the main body of the function will only run as many times as there are distinct values i that the function can be called with. There are at most |W|+1 such values, which is O(n), and outside of recursive calls, a call to f(i) takes at most O(n) time to walk through Richard's trie, so overall the time complexity is bounded by O(n^2).