I'm trying to implement the algorithm given by Evgeny Kluev in his answer to loop rolling algorithm but I'm having trouble getting it to work. Below is an example I tried to work out by hand following his instructions:
text: ababacababd
<STEP 1>
suffixes and LCPs:
ababacababd
4
ababd
3
abacababd
2
abd
1
acababd
0
babacababd
3
babd
2
bacababd
1
bd
0
cababd
0
d
<STEP 2>
sorted LCP array indices: 0,1,5,2,6,3,7,4,8,9
(values) : 4,3,3,2,2,1,1,1,0,0
<STEP 3>
LCP groups sorted by position in text (format => position: entry):
lcp 4:
0: ababacababd
6: ababd
lcp 3:
1: babacababd
2: abacababd
6: ababd
7: babd
lcp 2:
2: abacababd
3: bacababd
7: babd
8: abd
lcp 1:
3: bacababd
4: acababd
8: abd
9: bd
lcp 0:
0: ababacababd
1: babacababd
4: acababd
5: cababd
9: bd
10: d
<STEP 4>
entries remaining after filter (LCP == positional difference):
none! only (abd),(bd) and (bacababd),(acababd) from LCP 1 group
have positional difference equal to 1 but they don't share prefixes
with each other. shouldn't i have at least (ab) and (ba) here?
Can anybody tell me what I'm doing wrong in this process?
Also, he says at the end of step 4 we should have all possible sequences in the text, does he mean all possible repeating sequences?
Is this a known algorithm with a name that I can find more details on elsewhere?
(I'm also confused about his definition of intersecting sequences in step 5, but maybe it would make sense if I understood the preceding steps correctly).
EDIT: Here is what I have for step 4,5,6 after Evgeny's helpful clarification:
<STEP 4>
filter pseudocode:
results = {}
for (positions,lcp) in lcp_groups:
results[lcp] = []
while positions not empty:
pos = positions.pop(0) #pop lowest element
if (pos+lcp) in positions:
common = prefix(input, pos, lcp)
if common.size() < lcp:
next
i = 1
while (pos+lcp*(i+1)) in positions:
if common != prefix(input, pos+lcp*i, lcp):
break
positions.delete(pos+lcp*i)
i += 1
results[lcp].append( (common, pos, i+1) )
application of filter logic:
lcp 4:
0: ababacababd # 4 not in {6}
6: ababd # 10 not in {}
lcp 3:
0: ababacababd # 3 not in {1,2,6,7}
1: babacababd # 4 not in {2,6,7}
2: abacababd # 5 not in {6,7}
6: ababd # 9 not in {7}
7: babd # 10 not in {}
lcp 2:
0: ababacababd # 2 in {1,2,3,6,7,8}, 4 not in {1,2,3,6,7,8} => ("ab", 0, 2)
1: babacababd # 3 in {2,3,6,7,8}, 5 not in {2,3,6,7,8} => ("ba", 1, 2)
2: abacababd # 4 not in {3,6,7,8}
3: bacababd # 5 not in {6,7,8}
6: ababd # 8 in {7,8}, 10 not in {7,8} => ("ab", 6, 2)
7: babd # 9 not in {8}
8: abd # 10 not in {}
lcp 1:
0: ababacababd # 1 in {1,2,3,4,6,7,8,9}, prefix is ""
1: babacababd # 2 in {2,3,4,6,7,8,9}, prefix is ""
2: abacababd # 3 in {3,4,6,7,8,9}, prefix is ""
3: bacababd # 4 in {4,6,7,8,9}, prefix is ""
4: acababd # 5 not in {6,7,8,9}
6: ababd # 7 in {7,8,9}, prefix is ""
7: babd # 8 in {8,9}, prefix is ""
8: abd # 9 in {9}, prefix is ""
9: bd # 10 not in {}
sequences: [("ab", 0, 2), ("ba", 1, 2), ("ab", 6, 2)]
<STEP 5>
add sequences in order of LCP grouping. sequences within an LCP group
are added according to which was generated first:
lcp 4: no sequences
lcp 3: no sequences
lcp 2: add ("ab", 0, 2)
lcp 2: dont add ("ba", 1, 2) because it intersects with ("ab", 0, 2)
lcp 2: add ("ab", 6, 2)
lcp 1: no sequences
collection = [("ab", 0, 2), ("ab", 6, 2)]
(order by position not by which one was added first)
<STEP 6>
recreate input by iterating through the collection in order and
filling in gaps with the normal input:
input = "ab"*2 + input[4..5] + "ab"*2 + input[10..10]
Evgeny, one quick question for you if you happen to look at this again: Am I doing step 5 correctly? That is, do I add sequences according to which LCP group they were generated from (with higher LCP valued groups coming first)? Or is it something else related to LCP?
Also, if there is anything wrong with step 4 or 6 please let me know, but it seems what I have works very well for this example.