7

Problem:

Given a String S of N characters (N <= 200 000), find the length of the longest substring that appears at least twice (the substrings can overlap).

My solution:

Here's what i've tried:

int main()
{
    std::string s;
    std::cin >> s;
    int max = 0;
    typedef std::string::const_iterator sit;
    sit end = s.end();
    for(sit it1 = s.begin(); it1 != end; ++it1)
        for(sit it2 = it1 + 1; it2 != end; ++it2)
            max = std::max(max, std::mismatch(it1, it1 + (end - it2), it2).first - it1);
    std::cout << max;
}

Question:

However, the above solution will get TLE as it runs in O(n^3). Is there any ways to improve it so it can run in O(n.logn)?

unglinh279
  • 675
  • 4
  • 24

2 Answers2

4

find the length of the longest substring that appears at least twice (the substrings can overlap)

This problem is also commonly known as Longest repeated substring problem. It can be solved in linear time with a suffix tree.

To solve this problem:

  1. Add a special character '$' to the given string S),
  2. build a suffix tree from S ;
  3. the longest repeated substring of S is indicated by the deepest internal node in the suffix tree, where depth is measured by the number of characters traversed from the root.

Time complexity:

  • Suffix tree takes O(nlog(k))) time, where k is the size of the alphabet (if k is considered to be a constant, the asymptotic behaviour is linear)
  • tree traversal(for finding longest repeated substring) can be done in O(n) time
3

Suffix tree is an overkill for this problem. In fact, binary search suffices and the implementation is much easier.

Idea

The idea is simple: If there exists a duplicated substring of length N (N > 1), there must also exists one of length N - 1. Therefore, if we let f(x) to denote a function that returns true if there exists a duplicated substring of length x, f(x) will be a monotonic function, which allows a binary search solution.

Implementation

Binary search on the length of the longest duplicated substring and apply sliding windows to check if a given length is possible. Use string hashing to check for duplicate. Complexity: N log N

TYeung
  • 2,579
  • 2
  • 15
  • 30
  • Binary tree for duplicate checking will not work because each comparison takes up to O(N) so the total is ≥ O(N²). – user202729 Oct 24 '21 at 03:45