I have come up with the term loop rolling myself with the hope that it does not overlap with an existing term. Basically I'm trying to come up with an algorithm to find loops in a printed text.
Some examples from simple to complicated
Example1
Given:
a a a a a b c d
I want to say:
5x(a) b c d
or algorithmically:
for 1 .. 5
print a
end
print b
print c
print d
Example2
Given:
a b a b a b a b c d
I want to say:
4x(a b) c d
or algorithmically:
for 1 .. 4
print a
print b
end
print c
print d
Example3
Given:
a b c d b c d b c d b c e
I want to say:
a 3x(b c d) b c e
or algorithmically:
print a
for 1 .. 3
print b
print c
print d
end
print b
print c
print d
It didn't remind me of any algorithm that I know of. I feel like some of the problems can be ambiguous but finding one of the solutions is enough to me for now. Efficiency is always welcome but not mandatory. How can I do this?
EDIT
First of all, thanks for all the discussion. I have adapted an LZW algorithm from rosetta and ran it on my input:
abcdbcdbcdbcdef
which gave me:
a
b
c
d
8 => bc
10 => db
9 => cd
11 => bcd
e
f
where I have a dictionary of:
a a
c c
b b
e e
d d
f f
8 bc
9 cd
10 db
11 bcd
12 dbc
13 cdb
14 bcde
15 ef
7 ab
It looks good for compression but it's not quite what I wanted. What I need is more like compression in the algorithmic representation from my examples which would have:
- subsequent sequences (if a sequence is repeating, there would be no other sequence in between)
- no dictionary but only loops
- irreducable
- with maximum sequence sizes (which would minimize the algorithmic representation)
- and let's say nested loops are allowed (contrary to what I said before in the comment)