4

Problem

I was given this problem in my Algorithms class today:

Given function maxSubstring(s, t), where s is a string and t is a substring of s, find the maximum number of iterations you can delete either the first or last occurrence of substring t .

Concept

Here is a visualization of the function maxSubstring called on s = banababbaa and t = ba.

          b  a  n  a  b  b  a  a
1st move: n  a  b  a  b  b  a            or   b  a  n  a  b  a  b  a
2nd move: n a b b a a  or  n a b a b a        n a b a b a  or  b a n a b a
3rd move:    n a b a   or   n a b a             n a b a    or    n a b a
4th move:             n  a                                n  a

Thus, this operation takes four moves.

Attempt

Here is my solution to the problem. It works, but it is very slow when I use larger strings as arguments.

Attempt #1

function maxSubstring(s, t) {
    if (s.includes(t)) {
        var idxSubstr = s.replace(t, '');
        var lastIdxSubstr = s.substr(0, s.lastIndexOf(t)) + s.substr(s.lastIndexOf(t) + t.length, s.length);
        return 1 + Math.max(maxSubstring(idxSubstr, t), maxSubstring(lastIdxSubstr, t)));
    }
    return 0;
}

Attempt #2

function maxSubstring(s, t) {
    if (s.includes(t)) {
        var idx = s.indexOf(t), lastIdx = s.lastIndexOf(t);
        var idxSubstr = s.substr(0, idx) + s.substr(idx + t.length, s.length);
        var lastIdxSubstr = s.substr(0, lastIdx) + s.substr(lastIdx + t.length, s.length);
        if (idx != lastIdx) {
            return 1 + Math.max(maxSubstring(idxSubstr, t), maxSubstring(lastIdxSubstr, t));
        } else {
            return 1 + maxSubstring(idxSubstr, t);
        }
    }
    return 0;
}

Reason for update: Minor change in efficiency by storing values of indexOf and lastIndexOf in variables.

Attempt #3

function maxSubstring(s, t) {
    var idx = s.indexOf(t);
    if (idx >= 0) {
        var lastIdx = s.lastIndexOf(t);
        var idxSubstr = s.substr(0, idx) + s.substr(idx + t.length);
        if (idx != lastIdx) {
            var lastIdxSubstr = s.substr(0, lastIdx) + s.substr(lastIdx + t.length);
            return 1 + Math.max(maxSubstring(idxSubstr, t), maxSubstring(lastIdxSubstr, t));
        } else {
            return 1 + maxSubstring(idxSubstr, t);
        }
    }
    return 0;
}

Reason for update: Reduced instances in which certain values were redefined and prevented lastIndexOf calculation before the first index is checked.

Answer Requirements

Is there any algorithm or method I may use to optimize this code? Math.max is the main culprit, so I would appreciate it if anyone has an idea on how to avoid using this method altogether.

In other words, maxSubstring should only be called once inside of itself, but Math.max requires that it be called twice (once for the first index of a substring and another time for the last index of that substring).

Lastly, do you mind telling me what the Big O Notation is for my solution and what the Big O Notation is for yours? This is not part of the original challenge, but I am curious, myself. Thanks in advance.

Anthony Krivonos
  • 4,596
  • 4
  • 16
  • 31
  • If `Math.max` is you're worried about only, then why don't you use `<`,`>` to compare, after all there are only 2 params. Use ternary operator and comparison operators to find max between 2. – shyammakwana.me Sep 06 '17 at 19:36
  • This concept is outlined here, but for Java. It doesn't show a great change in runtime of my function. https://stackoverflow.com/questions/2103606/is-math-maxa-b-or-abab-faster-in-java – Anthony Krivonos Sep 06 '17 at 19:39
  • 1
    This won't produce a big speedup, but your .includes() and .replace() are basically doing the same work twice of finding t within s, so you could get a bit of a speed up by combining them. In fact, your lastIndexOf(t) calls are also doing the same expensive operation, so maybe do it once instead 4 times! – Duncan Thacker Sep 06 '17 at 19:41
  • hmm. Other complexity included are indexOf and lastIndexOf, which are loops ultimately. – shyammakwana.me Sep 06 '17 at 19:42
  • To remove complexity, I suggest use `for loop` with combination of `length` and some variables. No `indexOf`, no `LastIndexOf`. – shyammakwana.me Sep 06 '17 at 19:47
  • 1
    Try some [dynamic programming approach](https://en.wikipedia.org/wiki/Dynamic_programming) – Bergi Sep 06 '17 at 20:11
  • @shyammakwana.me Thanks for the suggestion; though it doesn't speed up the function by much, it is still worth storing these values in variables. – Anthony Krivonos Sep 06 '17 at 20:34
  • @AnthonyKrivonos it still reduces algorithms's time complexity. And it depends on how you are measuring speed. You have to measure it by running it millions or billions times. – shyammakwana.me Sep 07 '17 at 18:09

1 Answers1

3

The major problem with the naive recursive algorithm that you have presented is that it is called very often on the same input s - exponentially often even, and exactly that is what causes the noticeable slowdown on larger strings.

What you can do against this is to use memoisation - remember the result for a specific input in a lookup table.

Another optimisation you can do is check whether deleting the first vs the last lead to different results at all. In most cases, it will absolutely not matter in which sequence you remove them, the number of possible removals is always the same. However, that is not the case when the matched substring can overlap with itself. As an example, try maxSubstring('ababaa', 'aba').

function maxSubstring(s, t, prevResults = new Map()) {
    function result(x) { prevResults.set(s, x); return x; }
    if (prevResults.has(s))
        return prevResults.get(s); // memoisation

    const first = s.indexOf(t);
    if (first == -1)
        return result(0);
    const withoutFirst = s.slice(0, first) + s.slice(first + t.length);

    const last = s.lastIndexOf(t);
    if (last == first) // only one match
        return result(1 + maxSubstring(withoutFirst, t, prevResults));

    if (t.lastIndexOf(t.charAt(t.length-1), t.length-1) == -1 // last character of t is found nowhere else in t
        || !t.includes(s.charAt(first+t.length))) // character after the match can never be part of a match
        // so this match is always removed in the optimal sequence and it doesn't matter whether as first or last
        return result(1 + maxSubstring(withoutFirst, t, prevResults));

    const withoutLast = s.slice(0, last) + s.slice(last + t.length);
    if (t.indexOf(t.charAt(0), 1) == -1 // first character of t is found nowhere else in t
        || !t.includes(s.charAt(last - 1))) // character before the match can never be part of a match
        // so this match is always removed and it doesn't matter when
        return result(1 + maxSubstring(withoutLast, t, prevResults));

    return result(1 + Math.max(maxSubstring(withoutFirst, t, prevResults),
                               maxSubstring(withoutLast, t, prevResults)));
}

Time Complexity Analysis

The number of recursive calls should be roughly quadratic in the number of removals. With my second suggestion, it might get down to linear in the best cases (depending on the patterns).

For every call, factor in the linear searches (indexOf, slice, etc.) and the Map lookup, though their average complexity will be less than that as the input get smaller and the pattern is often found early in the input. In any case, the complexity is polynomial, not exponential.

Anthony Krivonos
  • 4,596
  • 4
  • 16
  • 31
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Thanks for the advice! I added a statement that makes sure `Math.max` is only called if the `indexOf` and `lastIndexOf` values are not equal. – Anthony Krivonos Sep 06 '17 at 20:36
  • 1
    @AnthonyKrivonos I thought of that as well, but it doesn't really help - that happens only for the last removal anyway. – Bergi Sep 06 '17 at 20:39
  • 1
    @AnthonyKrivonos A better optimisation would be to place the `var idx = …, lastIdx = …;` in the first line of the function and then test `if (idx == -1 && lastIdx == -1) return 0; else if (idx == lastIdx) return 1; else …`, not calling `includes` at all. – Bergi Sep 06 '17 at 20:43
  • Yep, I used an optimization software to test it, and it does not change much. I'll update my further shortened code once more. – Anthony Krivonos Sep 06 '17 at 20:43
  • (Oops, `return 1` isn't allowed, we still need to search inside, it might not be the last one - as in `maxSubstring('aabb', 'ab')`) – Bergi Sep 06 '17 at 20:50
  • Yep, I understood the gist of your solution and updated the attempt with my interpretation. Still only works for 6 of my 15 test cases, where the latter 9 are timeouts. – Anthony Krivonos Sep 06 '17 at 20:52
  • 1
    @AnthonyKrivonos In your updated attempts (up to #3 as I write this) you are not including the bulk of what Bergi has suggested as performance improvements, notably memoization and not analyzing both paths if the substring does not overlap itself. I suspect adding those enhancements will likely resolve your timeouts in the test cases. – Trevor Freeman Sep 06 '17 at 21:13
  • @AnthonyKrivonos What Trevor said. I now posted an implementation with both of my suggestions implemented (though I'm not sure whether using only one of them might be enough). – Bergi Sep 06 '17 at 21:34
  • @Bergi Adding a map to hold my recursive data, as well as a third argument that updates this map, did it for me. I based my solution off your answer, and I am very appreciative of your help! Just one more question, what is the Big O time complexity of your solution involving memoization? – Anthony Krivonos Sep 06 '17 at 23:12
  • 1
    @AnthonyKrivonos I have no idea, the number of recursive calls should be roughly quadratic in the number of removals. With my second suggestion, it might get down to linear in the best cases (depending on the patterns). For every call, factor in the linear searches (`indexOf`, `slice` etc) and the map lookup, though their average complexity will be less than that as the inputs get smaller and the pattern is often found early in the input. In any case, the complexity is polynomial not exponential. – Bergi Sep 06 '17 at 23:23
  • @Bergi Makes sense, I suppose it isn't too difficult to time the function against a broad range of data and find a general trend. Thanks again! – Anthony Krivonos Sep 06 '17 at 23:27