1

I am using this program for computing the suffix array and the Longest Common Prefix.

I am required to calculate the longest common substring between two strings.

For that, I concatenate strings, A#B and then use this algorithm.

I have Suffix Array sa[] and the LCP[] array.

The the longest common substring is the max value of LCP[] array.

In order to find the substring, the only condition is that among substrings of common lengths, the one occurring the first time in string B should be the answer.

For that, I maintain max of the LCP[]. If LCP[curr_index] == max, then I make sure that the left_index of the substring B is smaller than the previous value of left_index.

However, this approach is not giving a right answer. Where is the fault?

max=-1;
for(int i=1;i<strlen(S)-1;++i)
{
    //checking that sa[i+1] occurs after s[i] or not
    if(lcp[i] >= max && sa[i] < l1 && sa[i+1] >= l1+1 )
    {
        if( max == lcp[i] && sa[i+1] < left_index ) left_index=sa[i+1];

        else if (lcp[i] > ma )
        {
            left_index=sa[i+1];
            max=lcp[i];
        }
    }
    //checking that sa[i+1] occurs after s[i] or not
    else if (lcp[i] >= max && sa[i] >= l1+1 && sa[i+1] < l1 )
    {
        if( max == lcp[i] && sa[i] < left_index) left_index=sa[i];

        else if (lcp[i]>ma)
        {
            left_index=sa[i];
            max=lcp[i];
        }
    }
}
Community
  • 1
  • 1
user3080029
  • 553
  • 1
  • 8
  • 19

1 Answers1

1

AFAIK, This problem is from a programming contest and discussing about programming problems of ongoing contest before editorials have been released shouldn't be .... Although I am giving you some insights as I got Wrong Answer with suffix array. Then I used suffix Automaton which gives me Accepted.

Suffix array works in O(nlog^2 n) whereas Suffix Automaton works in O(n). So my advice is go with suffix Automaton and you will surely get Accepted. And if you can code solution for that problem, you will surely code this.

Also found in codchef forum that:

Try this case 
babaazzzzyy 
badyybac 
The suffix array will contain baa... (From 1st string ) , baba.. ( from first string ) , bac ( from second string ) , bad from second string .
So if you are examining consecutive entries of SA then you will find a match at "baba" and "bac" and find the index of "ba" as 7 in second string , even though its actually at index 1 also . 
Its likely that you may output "yy" instead of "ba"

And also handling the constraint ...the first longest common substring to be found on the second string, should be written to output... would be very easy in case of suffix automaton. Best of luck!

Kaidul
  • 15,409
  • 15
  • 81
  • 150