Searching a minimal string meeting some conditions

Question

Recently, I was asked the following problem during an interview.

Given a string S, I need to find another string S2 such that S2 is a subsequence of S and also S is a subsequence of S2+reverse(S2). Here '+' means concatenation. I need to output the min possible length of S2 for given S.

I was told that this is a dynamic programming problem however I was unable to solve it. Can somebody help me with this problem?

EDIT-

Is there a way to do this in O(N²) or less.

Does a O(n^3) solution is acceptable? – Sayakiss May 06 '16 at 09:40 — Sayakiss, May 06 '16 at 09:40
No, I need a solution better than O(n^3). – LTim May 07 '16 at 05:17 — LTim, May 07 '16 at 05:17

score 1 · Answer 1 · answered May 10 '16 at 10:56

There are 2 important aspects in this problem.

Since we need S as a substring of S2+reverse(S2), S2 should have atleast n/2 length.
After concatenation of S2 and reverse(S2), there is a pattern where the alphabets repeats such as

So the solution is to check from the center of S to end of S for any consecutive elements. If you find one then check the elements on either side as shown.

Now if you are able to reach till the end of the string, then the minimum number of elements (result) is the distance from start to the point where you find consecutive elements. In this example its C i.e 3.

We know that this may not happen always. i.e you may not be able to find consecutive elements at the center. Let us say the consecutive elements are after the center then we can do the same test.

Main string

Substring

Concatenated string

Now arrives the major doubt. Why we consider only the left side starting from center? The answer is simple, the concatenated string is made by S+reverse(S). So we are sure that the last element in the substring comes consecutive in the concatenated string. There is no way that any repetition in the first half of the main string can give a better result because at least we should have the n alphabets in the final concatenated string

Now the matter of complexity: Searching for consecutive alphabets give a maximum of O(n) Now checking elements on either side iteratively can give a worst case complexity of O(n). i.e maximum n/2 comparisons. We may fail many times doing the second check so the we have a multiplicative relation between the complexities i.e O(n*n).

I believe this is a correct solution and didn't find any loophole yet.

What about the code? [And also why did you explain it like he is five?](https://i.gr-assets.com/images/S/photo.goodreads.com/hostedimages/1417027346i/12170103.jpg) — ossobuko, May 10 '16 at 23:21

Ante · Answer 2 · 2017-02-05T09:51:14.223

0

Each character from S can be includes in S2 or not. With that we can construct recursion that tries two cases:

first character of S is used for cover,
first character of S is not used for cover,

and calculate minimum of these two covers. To implement this, it is enough to track how much of S is covered with already chosen S2+reverse(S2).

There are optimizations where we know what result is (found cover, can't have cover), and it is not needed to take first character for cover if it will not cover something.

Simple python implementation:

cache = {}

def S2(S, to_cover):
    if not to_cover:  # Covered
        return ''
    if not S:         # Not covered
        return None
    if len(to_cover) > 2*len(S):  # Can't cover
        return None
    key = (S, to_cover)
    if key not in cache:
        without_char = S2(S[1:], to_cover)     # Calculate with first character skipped
        cache[key] = without_char
        _f = to_cover[0] == S[0]
        _l = to_cover[-1] == S[0]
        if _f or _l:
            # Calculate with first character used
            with_char = S2(S[1:], to_cover[int(_f):len(to_cover)-int(_l)])
            if with_char is not None:
                with_char = S[0] + with_char  # Append char to result
                if without_char is None or len(with_char) <= len(without_char):
                    cache[key] = with_char
    return cache[key]

s = '21211233123123213213131212122111312113221122132121221212321212112121321212121132'
c = S2(s, s)
print len(s), s
print len(c), c

edited Feb 05 '17 at 09:51

answered May 01 '16 at 18:33

Ante

5,350
6
23
46

Ante - I'm unable to understand your solution. Are you passing strings as parameters for DP? What are your DP states? Sorry, I've never used python so this code is confusing me. – LTim May 02 '16 at 15:40
@LTim Both method parameters are string. First parameter is 'tail' of initial string S from which rest of S2 is chosen. Second parameter is what left of S to be covered with S2+reverse(S2). Method returns string S2 if found. If S2 can't be found than method returns None. – Ante May 02 '16 at 19:27
Your solution seems inefficient, Is there a way to do this in O(N^2) or less. I don't need the string but only the min length. – LTim May 03 '16 at 07:38
@LTim To get cover length change line 'with_char = S[0] + with_char' to 'with_char = 1 + with_char'. Maximal number of elements in cache is of order N^3, since parameter S can have N different values and to_cover N*(N-1)/2 values. I think that practical number is much smaller because of sequence of calls and structure of to_cover, around N^2. Cache can be indexed to have constant access. With that I think that running time is O(N^2) where N is S length. Upper example runs ~23ms on my laptop. If string has more different chars than there are less repetitions and it runs faster. – Ante May 04 '16 at 20:06
Hi @Ante: your solution appears to produce the wrong result in the case of `abacabca`. It returns `abcabca` but the minimal solution is `abacab`. – nneonneo Feb 05 '17 at 06:00
@nneonneo Ups, yes, check `len(with_char) >= len(without_char)` should be `len(with_char) <= len(without_char)` :-/ I'm changing it now. – Ante Feb 05 '17 at 09:50

score 0 · Answer 3 · answered May 10 '16 at 04:30

Let's say that S2 is "apple". Then we can make this assumption:

S2 + reverseS2 >= S >= S2
"appleelppa" >= S >= "apple"

So the given S will something including "apple" to not more than "appleelppe". It could be "appleel" or "appleelpp".

String S ="locomotiffitomoc";
// as you see S2 string is "locomotif" but 
// we don't know S2 yet, so it's blank
String S2 = "";
for (int a=0; a<S.length(); a++) {
    try {
        int b = 0;
        while (S.charAt(a - b) == S.charAt(a + b + 1))
            b++;
        // if this for loop breaks that means that there is a character that doesn't match the rule
        // if for loop doesn't break but throws an exception we found it.
    } catch (Exception e) {
        // if StringOutOfBoundsException is thrown this means end of the string.
        // you can check this manually of course.
        S2 = S.substring(0,a+1);
        break;
    }
}
System.out.println(S2); // will print out "locomotif"

Congratulations, you found the minimum S2.

1

substring ≠ subsequence – Sayakiss May 27 '16 at 02:22

Searching a minimal string meeting some conditions

3 Answers3