Why/how does the Longest Proper Prefix/Suffix algorithm work?

Question

The LPS (Longest Proper Prefix which is also a Suffix) algorithm goes as follows:

public static int[] constructLPSArray(String s) {
        int n = s.length();
        int[] arr = new int[n];
        int j = 0;
        for (int i = 1; i < n; ) {
            if (s.charAt(i) == s.charAt(j)) {
                arr[i] = j + 1;
                i++;
                j++;
            } else {
                if (j != 0) {
                    j = arr[j - 1];
                } else {
                    i++;
                }
            }
        }
        return arr;
    }

The if (s.charAt(i) == s.charAt(j)) part looks clear, but the else part is unclear. Why do we do:

if (j != 0) {
  j = arr[j - 1];
} else {
  i++;
}

More specifically, why does j = arr[j - 1] work ? Or why do we even do it? How do we validate the correctness of this step?

score 6 · Accepted Answer · answered Mar 26 '20 at 15:33

6

Let's say we are parsing an array of characters with i and j positioned like this:

a b a b x x a b a b ...
      ^           ^
      j           i

with arr holding:

0 0 1 2 0 0 1 2 3 4

i. e., the length of the longest prefix/suffix of each substring of s of that length until i. You can probably guess how that was generated from the rest of the algorithm. Now, if the next character after i does not match the next character after j,

a b a b x x a b a b a ...
        ^           ^
        j           i

we don't have to retry the matching, because we know the longest prefix/suffix of our previous prefix/suffix! Looking up arr[j - 1] yields 2 – so we essentially cached the information that the parts highlighted here

A B a b x x a b A B a ...
=== ^           === ^
    j               i

are identical, and don't need to be compared again!

answered Mar 26 '20 at 15:33

Tau

496
4
22

could you explain why time complexity is O(n) not n^2 where n is the length of the string? since 'j' may get decremented in some loop iterations, it seems it could be O(n^2) – Joe Black Mar 29 '20 at 23:52
does it need to look up the "longest prefix/suffix of our previous prefix/suffix" that decrements j only once for any i? wondering about how it's linear in complexity. – Joe Black Mar 29 '20 at 23:54
@JoeBlack x = # i,j is increased; y = # j is decreased; z = # only i is increased; A = x+y+z = the total runtime of the algorithm we know x+z < n since i can only be increased at most n times j = arr[j-1] will always decrease j by some value. the worst case would be pretending j = arr[j-1] = j-1. so now we know: j's intial value is 0 j is increased x times by 1 j is decreased y times by 1 j cannot be negative at any point this means that y < x thus we have: the worst case for y = x, y < n the worst case for x+z = n therefore A = n + n = O(n) – Renzhentaxi Baerde Jan 02 '21 at 15:55

score -2 · Answer 2 · answered Sep 24 '20 at 18:19

  *Here's one more solution*
  int length=str.length();
  int mid=length/2;
  if(length<2){
       System.out.println("-1");
  }
  for(int i=mid;i>=0;i--){
      String prefix=str.substring(0,i);
      String suffix=str.substring(length-i,length);
      if(suffix.equals("") || prefix.equals("")){
            System.out.println("-1");
      }
      if(suffix.equals(prefix)){
          System.out.println(suffix.length());
          break;
      }
      
  }

Why/how does the Longest Proper Prefix/Suffix algorithm work?

2 Answers2

Linked