4

I have a suffix array SA, and an array L that stores the length of LCP (longest common prefix) between two consecutive suffixes, i.e.

L[i]=LCP(SA[i-1],SA[i]) where 1<=i<=|SA|

It's also described here.

How should I use this array L to find the LCP(x,y) between given two suffixes x and y?

Ritesh Kumar Gupta
  • 5,055
  • 7
  • 45
  • 71
  • @ High Performance Mark:: I have edited it.Actually this was the exact question that i asked 2-3 days before ,but as i did n't get any response , so made the problem easier and straight forward . – Ritesh Kumar Gupta Jun 13 '12 at 10:10
  • Sorry if I misunderstand, but is (2) answered [here](http://stackoverflow.com/a/9390461/777186) perhaps? If yes, we could consider (2) done and concentrate on (1). – jogojapan Jun 18 '12 at 02:55
  • @jogojapan:: Yup 2 is answered .Thanks! – Ritesh Kumar Gupta Jun 18 '12 at 04:11
  • possible duplicate of [Complete Suffix Array](http://stackoverflow.com/questions/9389681/complete-suffix-array) – Saeed Amiri Jun 18 '12 at 21:26
  • @jogojapan::I have edited my question. [I removed my second question,as it is answered]. – Ritesh Kumar Gupta Jun 18 '12 at 21:29
  • Great, it's much clearer now. About the remaining question: Do you really need LCP(x,y) for *any* values of x and y? If you want to use this to implement binary search across the suffix array (and I think that is what you mean by 'Manber/Myers method'), you only need a certain subset of the possible values of x and y. If that is what you are interested in, I can provide an answer. – jogojapan Jun 19 '12 at 01:02
  • Yeah ,can you plz provide answer. LCP b/w any values of x and y is what i want. – Ritesh Kumar Gupta Jun 19 '12 at 06:52
  • 4
    If you do need LCP for *any* x,y (**not** only the subset relevant for the Manber/Myers suffix array search algorithm), then the answer is quite complicated. The basic idea is that you use the standard LCP that you have already, and then for given x,y you look at all values LCP[x+1],...,LCP[y] and find the minimum. The minimum is the LCP of x and y that you want. In other words, you need to compute an RMQ (range minimum query) on your existing LCP array. The naive method takes O(n). But there are O(1)-time, O(n)-space methods. E.g. described by Sadakane, DOI: 10.1007/s00224-006-1198-x. – jogojapan Jun 19 '12 at 14:17
  • @ jogojapan :Thanks a lot,i got the solution.yes ,its RMQ .If i had standard Lcp calculate then i just have to find the RMQ. And it can be done in O(nlogn) preprocessing and O(1) query time .Thanks indeed! – Ritesh Kumar Gupta Jun 19 '12 at 15:30
  • @jogojapan Your comment is pretty much the answer that should be posted here. Can you convert that into an answer? – templatetypedef May 27 '14 at 16:46

1 Answers1

1
rank[0...7]: 4 6 8 1 2 3 5 7 
string:      a a b a a a a b 
------------------------------------------- 
 sa[1] = 3 : a a a a b         height[1] = 0 
 sa[2] = 4 : a a a b           height[2] = 3 
 sa[3] = 5 : a a b             height[3] = 2 
 sa[4] = 0 : a a b a a a a b   height[4] = 3 
 sa[5] = 6 : a b               height[5] = 1 
 sa[6] = 1 : a b a a a a b     height[6] = 2 
 sa[7] = 7 : b                 height[7] = 0 
 sa[8] = 2 : b a a a a b       height[8] = 1 

Here array height is equal to array L

using array sa , we can easily calculate array rank

Then using Sparse Table (ST) algorithm to calculate rmq question. http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor#Range_Minimum_Query_%28RMQ%29

int *RMQ = height;
//int RMQ[N];
int mm[N];
int best[20][N]; //best[i][j] means the minimal value in the range [j, j + 2^i)
void initRMQ(int n){
  int i,j,a,b;
  for(mm[0]=-1,i=1;i<=n;i++)
  mm[i]=((i&(i-1))==0)?mm[i-1]+1:mm[i-1];
  for(i=1;i<=n;i++) best[0][i]=i;
  for(i=1;i<=mm[n];i++)
  for(j=1;j<=n+1-(1<<i);j++){
    a=best[i-1][j];
    b=best[i-1][j+(1<<(i-1))];
    if(RMQ[a]<RMQ[b]) best[i][j]=a;
    else best[i][j]=b;
  }
}
int askRMQ(int a,int b){
  int t;
  t=mm[b-a+1];b-=(1<<t)-1;
  a=best[t][a];b=best[t][b];
  return RMQ[a]<RMQ[b]?a:b;
}
int lcp(int a,int b){ //this is your answer
  int t;
  a=rank[a];b=rank[b];
  if(a>b) {t=a;a=b;b=t;}
  return(height[askRMQ(a+1,b)]);
}

Then just call the function lcp(). Time and space is O(nlogn)

mickeyandkaka
  • 1,452
  • 2
  • 11
  • 21