Shortest string that can be made in at least two ways from an array of strings

Question

Statement:

Given an array of strings S. You can make a string by combining elements from the array S (you can use an element more than once)

In some situations, there are many ways to make a certain string from the array S.

Example:

S = {a, ab, ba}

Then there are 2 ways to make the string "aba":

"a" + "ba"
"ab" + "a"

Question:

Given an array of string S. Find the shortest string such that there are more that one way to make that string from S. If there're none, print out -1.

P/S: I have been thinking for many days but this is the best one I've got so far:

Generate all permutations of the array
For each permutation, make a string from the array S
Check if that string is made before, if yes, print out the string, if not, save that string.

But this algorithm clearly won't pass all the test cases. I can't think of any better algorithm.

Rough idea: Make a trie-automaton thing, then dynamic programming on it to find two different paths that ends in the root node. (where does the problem come from? It looks familiar) — user202729, Oct 24 '21 at 02:59
@user202729 what is a trie-automaton thing? i can't find it on google. can you provide a link or something? the problem is from a local contest in my school, i don't really know the source. — unglinh279, Oct 24 '21 at 03:03

Matt Timmermans · Answer 1 · 2021-10-24T14:39:39.447

3

Imagine that you have found your string, and you are matching it two different ways with strings from S. You start with two different strings that match the prefix, and then you repeatedly add a string from S to the shorter one until you end up with a matching length. From your example, that's

"ab"
"a"

"ab"
"aba"

"aba"
"aba"

At every step, you have 2 different strings from S that overlap at the end.

Imagine a directed graph where every vertex is a tuple (i,j,t), where i and j are the indexes of the overlapping strings at the end, and t is the number of characters left over at the end of the longer one after the overlapping section. Make it a rule that t >= 0 and that string i is always the one that ends first.

The edges of the graph indicate which vertexes you can get to by adding a new string to the shorter one, with a cost equal to the length of the added string. Of course you can only add a string if it overlaps with the t characters left over on the longer side.

Your task is then to use Dijkstra's algorithm to find the shortest path in this graph, from an initial selection of 2 distinct strings to a pair with t=0. Initially sorting the array of strings will let you use a binary search to find the strings that overlap the required suffix (the longer ones will all be together), which is an effective optimization.

edited Oct 24 '21 at 14:39

answered Oct 24 '21 at 14:31

Matt Timmermans

53,709
3
46
87

Could there be an issue with your suggestion to use Dijkstra in that we have unlimited strings to choose from and continually choosing the smallest string to add (which is what Dijkstra would do to prioritise the cost you describe) might lead to a suboptimal solution (or possibly timing out)? – גלעד ברקן Oct 25 '21 at 11:13
You don't choose the smallest string to add. You choose all of them that match. Each one leads to a new vertex that goes into the priority queue with its updated total path cost. The total path cost is equal to the sum of lengths added, so shortest path = optimal solution, and Dijkstra's is guaranteed to find it. If there is no solution then you'll eventually stop discovering new vertexes and the queue will empty. – Matt Timmermans Oct 25 '21 at 11:47
Oh, my bad, priority is on total cost after the addition. – גלעד ברקן Oct 25 '21 at 11:58
Regarding no solution, couldn't there be a degenerate input case of a cycle without end, where matches can continue but there's always some left over? – גלעד ברקן Oct 25 '21 at 12:06
1

There are at most O(|S|^2 * max_len(strings in S)) reachable vertices, and following Dijkstra, they're only enqueued when you discover a new one or a shorter path to one. – Matt Timmermans Oct 25 '21 at 12:09
Thank you for persisting in explaining this to me :) – גלעד ברקן Oct 25 '21 at 12:33

johnchen902 · Accepted Answer · 2021-10-31T15:28:54.367

Here's an O(N³) algorithm, where N is the total length of each string:

For every element S_i in S:
1. Construct an NFA for the regular expression S_i(S₁|...|S_n)*
2. Construct an NFA for the regular expression (S₁|...|S_i-1|S_i+1|...|S_n)(S₁|...|S_n)*
3. Construct an NFA that is the intersection of the NFA in step 1.1 and step 1.2
4. Find the shortest string accepted by the NFA in step 1.3
Return the shortest string among the strings in step 1.4

The above algorithm can be improved to O(N²logN):

Construct an NFA for the regular expression (S₁|...|S_n)*
Construct cross-product of two copies of the NFA in step 1.
For each state in the NFA in step 2, find the shortest string accepted by the NFA from that state.
Let = {S}.
While is not empty:
1. Take an element T from .
2. Partition T into P and Q somewhat evenly.
3. Construct an NFA for the regular expression (P₁|...|P_p)(S₁|...|S_n)*, reusing the NFA in step 1.
4. Construct an NFA for the regular expression (Q₁|...|Q_q)(S₁|...|S_n)*, reusing the NFA in step 1.
5. Construct cross-product of the NFA in step 5.3 and step 5.4, reusing the NFA in step 2.
6. Find the shortest string accepted by the NFA in step 5.5, reusing the result of step 3.
7. If P has more than 1 element, put P into .
8. If Q has more than 1 element, put Q into .
Return the shortest string among the strings in step 5.6

Edit: The following is an O(N²) algorithm, improved from the top answer:

Let T be every suffix of every string of S.
Build a trie out of T.
Let G be a weighted directed graph. The vertices are the elements of T, and the edges are: for each string S_i in S and T_j in T, if S_i = T_j + D or T_j = S_i + D (using the trie in step 2 to find all such pairs), add an edge from D to T_j weighted length of S_i.
Find the distance from the empty string to every vertex in G.
For each string S_i, S_j in S, if S_i != S_j and S_i = S_j + D (using the trie in step 2 to find all such pairs), find the distance from the empty string to D (using step 4).
The length of the answer is half of the shortest distance among all distances in step 5. (It's trivial to find the actual answer but I'm too lazy to describe it :p)

Shortest string that can be made in at least two ways from an array of strings

2 Answers2