0

I am looking at example suffix arrays and longest common prefixes, but I do not understand the criteria for how the the suffix array is sorted. I am looking at the example on wikipedia where they use banana$ Can someone please explain how a suffix array is sorted? My first instinct was to sort by length, but that is clearly not the case here.

(Here is the example they used http://en.wikipedia.org/wiki/Suffix_array)

These suffixes can be sorted:

Suffix  i
$       7
a$      6
ana$    4
anana$  2
banana$ 1
na$     5
nana$   3
Anshul Goyal
  • 73,278
  • 37
  • 149
  • 186
user137717
  • 2,005
  • 4
  • 25
  • 48

3 Answers3

0

Here is my code template on acm/icpc, using DA algorithm

/* 
rank[0...7]: 4 6 8 1 2 3 5 7 
string:      a a b a a a a b 
------------------------------------------- 
 sa[1] = 3 : a a a a b         height[1] = 0 
 sa[2] = 4 : a a a b           height[2] = 3 
 sa[3] = 5 : a a b             height[3] = 2 
 sa[4] = 0 : a a b a a a a b   height[4] = 3 
 sa[5] = 6 : a b               height[5] = 1 
 sa[6] = 1 : a b a a a a b     height[6] = 2 
 sa[7] = 7 : b                 height[7] = 0 
 sa[8] = 2 : b a a a a b       height[8] = 1 
*/ 

const int MAXN = 200010;
int r[MAXN],sa[MAXN];
int ua[MAXN],ub[MAXN],uv[MAXN],us[MAXN];
int cmp(int *r,int a,int b,int l)
{return r[a] == r[b] && r[a+l]==r[b+l];}


void da(int *r,int *sa,int n,int m)   
{                                     
    int i,j,p,*x = ua,*y = ub,*t;
    for(i=0; i<m; i++)  us[i] = 0;
    for(i=0; i<n; i++)  us[x[i] = r[i]]++;
    for(i=1; i<m; i++)  us[i] += us[i-1];
    for(i=n-1; i>=0; i--)   sa[--us[x[i]]] = i;
    for(j=1,p=1; p<n; j<<=1,m=p)
    {
        for(p=0,i=n-j; i<n; i++)    y[p++] = i;
        for(i=0; i<n; i++)  if(sa[i]>=j) y[p++]=sa[i]-j;
        for(i=0; i<n; i++)  uv[i] = x[y[i]];
        for(i=0; i<m; i++)  us[i] = 0;
        for(i=0; i<n; i++)  us[uv[i]]++;
        for(i=1; i<m; i++)  us[i]+=us[i-1];
        for(i=n-1; i>=0; i--) sa[--us[uv[i]]] = y[i];
        for(t=x,x=y,y=t,p=1,x[sa[0]]=0,i=1; i<n; i++)
           x[sa[i]] = cmp(y,sa[i-1],sa[i],j)?p-1:p++;
    }
}

int rank[MAXN], height[MAXN];
void calh(int *r, int *sa, int n)
{
    int i, j, k = 0;
    for(i=1; i<=n; ++i) rank[sa[i]] = i;
    for(i=0; i<n; height[rank[i++]] = k)
        for(k?k--:0,j=sa[rank[i]-1]; r[i+k]==r[j+k];k++);
}

the time above is O(nlogn)

There is also time in O(n) You can refer this article: http://www.cs.cmu.edu/~guyb/realworld/papersS04/KaSa03.pdf

mickeyandkaka
  • 1,452
  • 2
  • 11
  • 21
0

I also struggled to understand suffix array stuff then I found this intuition behind suffix array and how suffix array with LCP works.

Hope it helps! If you need additional explanation, I will provide here.

Community
  • 1
  • 1
Kaidul
  • 15,409
  • 15
  • 81
  • 150
0

There is a tutorial on suffix array at http://discuss.codechef.com/questions/21385/a-tutorial-on-suffix-arrays

This tutorial contains things like what is a suffix array? How is it constructed? What is the most efficient way to construct suffix array ? What are its application?

hopefully you will get What you are seeking for..

Nitin9791
  • 1,124
  • 1
  • 14
  • 17