Here is a Java implementation which finds the maximum overlap between two strings with length N and M in something like O(min(N,M)) operations ~ O(N).
I had the same idea as @sepp2k:s now deleted answer, and worked a bit further on it. Seems to work fine. The idea is to iterate through the first string and start tracking once you find something which matches the start of the second string. Figured out that you might need to do multiple simultaneous trackings if false and true matches are overlapping. At the end you choose the longest track.
I havent worked out the absolutely worst case yet, with maximal overlap between matches, but I don't expect it to spiral out of control since I think that you cannot overlap arbitrary many matches. Normally you only track one or two matches at a time: the candidates are removed as soon as there is a mismatch.
static class Candidate {
int matchLen = 0;
}
private String overlapOnce(@NotNull final String a, @NotNull final String b) {
final int maxOverlap = Math.min(a.length(), b.length());
final Collection<Candidate> candidates = new LinkedList<>();
for (int i = a.length() - maxOverlap; i < a.length(); ++i) {
if (a.charAt(i) == b.charAt(0)) {
candidates.add(new Candidate());
}
for (final Iterator<Candidate> it = candidates.iterator(); it.hasNext(); ) {
final Candidate candidate = it.next();
if (a.charAt(i) == b.charAt(candidate.matchLen)) {
//advance
++candidate.matchLen;
} else {
//not matching anymore, remove
it.remove();
}
}
}
final int matchLen = candidates.isEmpty() ? 0 :
candidates.stream().map(c -> c.matchLen).max(Comparator.comparingInt(l -> l)).get();
return a + b.substring(matchLen);
}
private String overlapOnce(@NotNull final String... strings) {
return Arrays.stream(strings).reduce("", this::overlapOnce);
}
And some tests:
@Test
public void testOverlapOnce() throws Exception {
assertEquals("", overlapOnce("", ""));
assertEquals("ab", overlapOnce("a", "b"));
assertEquals("abc", overlapOnce("ab", "bc"));
assertEquals("abcdefghqabcdefghi", overlapOnce("abcdefgh", "efghqabcdefghi"));
assertEquals("aaaaaabaaaaaa", overlapOnce("aaaaaab", "baaaaaa"));
assertEquals("ccc", overlapOnce("ccc", "ccc"));
assertEquals("abcabc", overlapOnce("abcabc", "abcabc"));
/**
* "a" + "b" + "c" => "abc"
"abcde" + "defgh" + "ghlmn" => "abcdefghlmn"
"abcdede" + "dedefgh" + "" => "abcdedefgh"
"abcde" + "d" + "ghlmn" => "abcdedghlmn"
"abcdef" + "" + "defghl" => "abcdefghl"
*/
assertEquals("abc", overlapOnce("a", "b", "c"));
assertEquals("abcdefghlmn", overlapOnce("abcde", "defgh", "ghlmn"));
assertEquals("abcdedefgh", overlapOnce("abcdede", "dedefgh"));
assertEquals("abcdedghlmn", overlapOnce("abcde", "d", "ghlmn"));
assertEquals("abcdefghl", overlapOnce("abcdef", "", "defghl"));
// Consider str1=abXabXabXac and str2=XabXac. Your approach will output abXabXabXacXabXac because by
// resetting j=0, it goes to far back.
assertEquals("abXabXabXac", overlapOnce("abXabXabXac", "XabXac"));
// Try to trick algo with an earlier false match overlapping with the real match
// - match first "aba" and miss that the last "a" is the start of the
// real match
assertEquals("ababa--", overlapOnce("ababa", "aba--"));
}