0

Possible Duplicate:
Find longest repetitive sequence in a string

I'm working on a problem and I need to find pattern that's repeating most.

For simplicity and convenience please consider this string:

What is Lorem Ipsum?
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s...

Sequence that repeats most (initially considering string length greater the 3 characters, for example) is "Lorem Ipsum". "Lorem" and "Ipsum" also repeat the same number of times of course, but longer string has precedence over shorter if they repeat same number of times.

What kind of algorithm can efficiently find this pattern, preferably in Python?

Community
  • 1
  • 1
matt
  • 13
  • 4

1 Answers1

0

As @fraxel indicated, you need to specify your problem a little more, but this sounds like, at best, a dynamic programming (http://en.wikipedia.org/wiki/Dynamic_programming) problem. Without specifying it further, however, it's impossible to know what sort of algorithm you need. For example, another uncertainty in your formulation the definition of a pattern. Is a pattern a simple string? or is "ababa" considered the same pattern as "acaca" in that it would match the regex or glob pattern "a*a*a".

Ben
  • 4,980
  • 3
  • 43
  • 84
  • Yes pattern is simple Python byte string, or if it easier for comprehension consider long string of decimal numbers. I used word pattern in common sense without implying regex use. – matt Jun 19 '12 at 21:05