1

I saw this code implementation here. It basically takes two strings, finds the longest common substring, and returns the length of it. I wanted to modify it slightly to get the starting indexes of the substrings for each words, but just can't figure out. I know it should be possible, since we are working with indexes of the string. I will write my edited version of the code below:


public class Main {
    public class Answer {
        int i, j, len;
        Answer(int i, int j, int len) {
            this.i = i;
            this.j = j;
            this.len = len;
        }
    }
    public Answer find(String s1,String s2){

        int n = s1.length();
        int m = s2.length();

        Answer ans = new Answer(0, 0, 0);
        int[] a = new int[m];
        int b[] = new int[m];

        for(int i = 0;i<n;i++){
            for(int j = 0;j<m;j++){
                if(s1.charAt(i)==s2.charAt(j)){
                   if(i==0 || j==0 )a[j] = 1;
                   else{
                       a[j] = b[j-1] + 1;
                   }
                   ans.len = Math.max(ans.len, a[j]);
                   ans.i = i;
                   ans.j = j;
                }

            }
            int[] c = a;
            a = b;
            b = c;
        }
        return ans;
    }
}
smg9450
  • 86
  • 1
  • 1
  • 6

2 Answers2

1

I am assuming if these are the two strings : s1 = "abcdxyz" s2 = "xyzabcd" then since abcd is longest common substring so you need index of this substring in both s1 and s2 which is 0,3 respectively.

There are two solution for this :

Solution 1 :

Here, I have created an index array where I am storing the starting index of both the string with index 0 of index array storing for s1 and index 1 storing for s2.

public Answer  find(String s1,String s2){

    int n = s1.length();
    int m = s2.length();

    Answer ans = new Answer(0, 0, 0);
    int[] a = new int[m];
    int b[] = new int[m];
    int indexes[] = new int[2];
    for(int i = 0;i<n;i++){
        for(int j = 0;j<m;j++){
            if(s1.charAt(i)==s2.charAt(j)){
               if(i==0 || j==0 )a[j] = 1;
               else{
                   a[j] = b[j-1] + 1;
               }
               if(a[j]>ans.len) {
                   ans.len = a[j];
                   indexes[0]=(i+1) - ans.len;
                   indexes[1]=(j+1) - ans.len;
               }
               ans.i = i;
               ans.j = j;

            }

        }
        int[] c = a;
        a = b;
        b = c;
    }
    return ans;
}

Solution 2 :

I am not sure what your Answer object i and j values are doing but we can make them as well store these values with i storing for s1 string and j storing for s2 string instead of creating different index array as in Solution 1.

public Answer  find(String s1,String s2){

    int n = s1.length();
    int m = s2.length();

    Answer ans = new Answer(0, 0, 0);
    int[] a = new int[m];
    int b[] = new int[m];
    int indexes[] = new int[2];
    for(int i = 0;i<n;i++){
        for(int j = 0;j<m;j++){
            if(s1.charAt(i)==s2.charAt(j)){
               if(i==0 || j==0 )a[j] = 1;
               else{
                   a[j] = b[j-1] + 1;
               }
               if(a[j]>ans.len) {
                   ans.len = a[j];
                   ans.i=(i+1) - ans.len;
                   ans.j=(j+1) - ans.len;
               }

            }

        }
        int[] c = a;
        a = b;
        b = c;
    }
    return ans;
}

Currently this doesn't calculate LCS right . Issue is you are not making array a as empty after running your second loop each time because of which if characters do not match in next run corresponding index of a stores previous value only but it should be 0.

Update code is :

 public Answer  find(String s1,String s2){

            int n = s1.length();
            int m = s2.length();

            Answer ans = new Answer(0, 0, 0);
            int[] a;
            int b[] = new int[m];
            int indexes[] = new int[2];
            for(int i = 0;i<n;i++){
                a = new int[m];
                for(int j = 0;j<m;j++){
                    if(s1.charAt(i)==s2.charAt(j)){
                       if(i==0 || j==0 )a[j] = 1;
                       else{
                           a[j] = b[j-1] + 1;
                       }
                       if(a[j]>ans.len) {
                           ans.len = a[j];
                           ans.i=(i+1) - ans.len;
                           ans.j=(j+1) - ans.len;
                       }

                    }

                }
                b = a;
            }
            return ans;
        }
  • Indeed, the purpose of the Answer object's i and j was to hold the values for starting indexes. – smg9450 Apr 17 '20 at 14:37
  • @smg9450 Then you can use Solution 2. I was not sure why I and j was used in Answer object, so gave both the solutions. –  Apr 17 '20 at 14:39
  • But now, I'm facing a little error: When I test this with ```"aabaa"``` and ```"babbaab"``` as inputs, the result is ```2 2 3```, although it should have been ```2 3 3``` – smg9450 Apr 17 '20 at 14:47
  • It is correct output right..in both these string , common substring is "baa" which in string "aabaa" is present at index 2 and string "babbaab" is present at index 3. Although your length is coming wrong it should be 3 instead of 2 . So, basically result should come as 3 2 3. Will check and let you know why length is coming wrong. –  Apr 17 '20 at 14:51
  • No, the length comes out right, but for some reason, the program says string "baa" is present at string "babbaab" at index 2, which is the wrong part. Maybe the algorithm itself to find the substring is not completely right. – smg9450 Apr 17 '20 at 14:54
  • When I ran the above corrected code with same strings , I got result as [3, 0,4 ] which is right since another longest substring of same length is "aab" which is of length 3 and in string s1("aabaa") it is present at index 0 and in string s2("babbaab") it is present at index 4. –  Apr 17 '20 at 14:56
  • Yes, my bad on that. Well, now the input ```baaababbbbaa aababaabaaaabbaabaa``` gives completely wrong answers, I see the algorithm I took from there itself isn't completely right, I was just so excited when I saw O(n) complexity. – smg9450 Apr 17 '20 at 15:12
  • This code is not o(n) complexity since it uses two for loops its complexity is anyhow o(n^2)..will check where it is going wrong. But longest common substring can be solved by dynamic programming in o(m*n), its complexity can be reduce by using suffix tree solution which can reduce it to O(n+m) –  Apr 17 '20 at 15:17
  • Hadn't heard about the suffix tree solution for this though, will definitely search into that – smg9450 Apr 17 '20 at 15:18
  • For suffix tree you can refer from here https://www.geeksforgeeks.org/suffix-tree-application-5-longest-common-substring-2/ –  Apr 17 '20 at 15:20
  • Have updated the function to calculate LCS , there is a small change that is required otherwise it works fine. If you have any issue related to change you can ask. –  Apr 17 '20 at 15:35
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/211880/discussion-between-smg9450-and-prerna-gupta). – smg9450 Apr 17 '20 at 15:47
  • Also can you please upvote the answer as well. Thanks –  Apr 17 '20 at 15:48
0

This is likely not the answer you're looking for, but it does solve your problem with just two extra lines.

Before returning the answer just subtract the length of the LCS and add 1 to both i and j which will account for the difference between what you expected and what you got.

Here's the code just in case:

ans.i = ans.i - ans.len + 1;
ans.j = ans.j - ans.len + 1;

return ans;

My answer might not be as comprehensive as the one by Prerna Gupta but on the other hand it does keep your algorithm just as it is now so I'll leave it here just in case.

Licanueto
  • 43
  • 6