7

My question is simple: Is there an O(n) algorithm for finding the longest contiguous subsequence between two sequences A and B? I searched it, but all the results were about the LCS problem, which is not what I'm seeking.

Note: if you are willing to give any sample code, you are more than welcome to do so, but please, if you can, in C or C++.

Edit: Here is an example:

A: { a, b, a, b, b, b, a }
B: { a, d, b, b, b, c, n }
longest common contiguous subsequence: { b, b, b }
Rontogiannis Aristofanis
  • 8,883
  • 8
  • 41
  • 58
  • If you aren't seeking lcs please supply a trivial example of what you are seeking – Woot4Moo Dec 25 '12 at 18:12
  • 3
    This is the same as the longest common substring. – fgb Dec 25 '12 at 18:21
  • that looks the same as LCS. Or is it LCS with the additional constraint that the sequence must be a repetition of the same symbols? i.e. `{a,b,c,d,d}` and `{d,d,a,b,c}` yields `{d,d}`? – marcus erronius Dec 25 '12 at 18:23
  • 1
    Clarifying fgb's comment for other newbs like me: Longest Common Contiguous Subsequence = Longest Common String. As Wikipedia says, " unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences". Hence "contiguous subsequence" or "consecutive subsequence" can be replaced by "substring". – Nate Anderson Oct 27 '15 at 18:22

3 Answers3

2

Yes, you can do this in linear time. One way is by building suffix trees for both the pattern and the text and computing their intersection. I can't think of a way to do this without involving suffix trees or suffix arrays, though.

tmyklebu
  • 13,915
  • 3
  • 28
  • 57
  • The solution with suffix trees is not linear. – Rontogiannis Aristofanis Dec 25 '12 at 18:25
  • @RondogiannisAristophanes [yes, it is](http://en.wikipedia.org/wiki/Longest_common_substring_problem#Suffix_tree). It's linear in n + m, so it's linear in n if n = m. – Haile Dec 25 '12 at 18:40
  • 1
    It takes linear time to build a suffix tree. (Surprising, but true.) Suffix trees take linear space. Walking two suffix trees simultaneously to find the longest common string takes linear time. – tmyklebu Dec 25 '12 at 18:56
1

I am not sure whether there exists an O(n) algorithm. Here is a O(n*n) dynamic solution, maybe it is helpful to you. Let lcs_con[i][j] represent the longest common contiguous subsequence which end with element A_i from array A and B_j from array B. Then we can get the equations below:

lcs_con[i][j]=0 if i==0 or j==0
lcs_con[i][j]=0 if A_i != B_j
lcs_con[i][j]=lcs_con[i-1][j-1] if A_i==B_j

we can record the maximum of lcs_con[i][j] and the ending index during the calculation to get the longest common contiguous subsequence. The code is below:

#include<iostream>

using namespace std;


int main()
{
    char A[7]={'a','b','a','b','b','b','a'};
    char B[7]={'a','d','b','b','b','c','n'};

    int lcs_con[8][8];    
    memset(lcs_con,0,8*8*sizeof(int));

    int result=-1;
    int x=-1;
    int y=-1;

    for(int i=1;i<=7;++i)
      for(int j=1;j<=7;++j)
      {
          if(A[i-1]==B[j-1])lcs_con[i][j]=lcs_con[i-1][j-1]+1;
          else lcs_con[i][j]=0;

          if(lcs_con[i][j]>result)
          {
               result=lcs_con[i][j];
               x=i;
               y=j;                   
          }
      }

   if(result==-1)cout<<"There are no common contiguous subsequence";
   else
   {
       cout<<"The longest common contiguous subsequence is:"<<endl;
       for(int i=x-result;i<x;i++)cout<<A[i];
       cout<<endl;
   }

   getchar();
   getchar();

   return 0;    
}

Hope it helps!

Chasefornone
  • 747
  • 1
  • 7
  • 15
  • I think there is a typo: `lcs_con[i][j]=lcs_con[i-1][j-1] if A_i==B_j` should be `lcs_con[i][j]=lcs_con[i-1][j-1] + 1 if A_i==B_j` – CodingNow Aug 02 '23 at 04:22
0

that is what you are looking for:

KMP algorithm c implementation

the basic theory:

  1. How to find Longest Common Substring using C++

  2. http://en.wikipedia.org/wiki/Longest_common_substring_problem

Community
  • 1
  • 1
0x90
  • 39,472
  • 36
  • 165
  • 245