5

A period p of a string w is any positive integer p such that w[i]=w[i+p] whenever both sides of this equation are defined. Let per(w) denote the size of the smallest period of w . We say that a string w is periodic iff per(w) <= |w|/2.

So informally a periodic string is just a string that is made up from a prefix repeated at least twice. The only complication is that at the end of the string we don't require a full copy of the prefix.

For, example consider the string x = abcab. per(abcab) = 3 as x[1] = x[1+3] = a, x[2]=x[2+3] = b and there is no smaller period. The string abcab is therefore not periodic. However, the string ababa is periodic as per(ababa) = 2.

As more examples, abcabca, ababababa and abcabcabc are also periodic.

Is there a regex to determine if a string is periodic or not?

I don't really mind which flavor of regex but if it makes a difference, anything that Python re supports.

Simd
  • 19,447
  • 42
  • 136
  • 271
  • which language you are using?? –  Jul 12 '16 at 07:53
  • @Kilanny It's a general regex question but if it makes a difference, anything that Python re supports. – Simd Jul 12 '16 at 07:53
  • @ClasG How about horcrux's answer below? – Simd Jul 12 '16 at 08:21
  • @eleanora I think ClasG is referring to a pure regular expression. Pure regular expressions has the same expressive power of a regular grammar, and they cannot determine periods or impose cardinalities (see my answer below). – logi-kal Jul 12 '16 at 08:34
  • @horcrux yes you are right. I was always told that regex != regexp :) – Simd Jul 12 '16 at 08:34
  • @ClasG you are right about pure regular expressions and you are right that abcdae is not periodic. – Simd Jul 12 '16 at 08:35
  • @ClasG abcdaefga is not periodic. To have period 4 you would need a = a, b = e, c = f and so on. – Simd Jul 12 '16 at 08:43
  • @ClasG Yes abcabcabc is periodic. Why do you ask about that string? – Simd Jul 12 '16 at 10:39
  • 1
    Because with `\b((\w)\w+)\1\2\b` it [does not work](https://regex101.com/r/oJ3kJ0/1). – logi-kal Jul 12 '16 at 10:42
  • @horcrux oh dear! Sorry I will add it as an example now. – Simd Jul 12 '16 at 10:44
  • @eleanora no fear, mine does work. – logi-kal Jul 12 '16 at 10:45
  • @horcrux Phew :) I added a more advanced follow up for the brave just now http://stackoverflow.com/questions/38326458/a-regex-for-maximal-periodic-substrings . – Simd Jul 12 '16 at 10:45

3 Answers3

5

What you need is backreference

\b(\w*)(\w+\1)\2+\b

This matches even abcabca and ababababa.

\1 and \2 are used to match the first and second capturing groups, respectively.

logi-kal
  • 7,107
  • 6
  • 31
  • 43
  • This looks very nice! I am just playing with it. – Simd Jul 12 '16 at 08:22
  • I think that if you consider periodic the empty string too, the right expression is: `\b(\w*)(\w*\1)\2+\b` – logi-kal Jul 12 '16 at 08:26
  • 1
    Very nice. Shorter than mine, but maybe a bit harder to grasp. IMHO you should use `^...$` instead of `\b...\b` to ensure that the entire string is periodic. – tobias_k Jul 12 '16 at 08:41
  • This is really nice. It even handles sequences *cut* in the beginning, like `cdabcdabcdabcd`. Thumbs up – SamWhan Jul 12 '16 at 10:52
4

You could use Regex back references.

For example (.+)\1+. This pattern will match a group () formed of at least one character .+. This group \1 (back reference) must repeat at least one time for a match.

The string ababa matches and it finds ab as the 1st group.

The string abcab is not a match.

Later edit

If you want a prefix that is repeated at least twice, you can change the pattern to: ^(.+)\1+. The problem is that I don't think you can match the end of the string to a substring of the prefix. So any string that starts with a repeating pattern will match but it will ignore the ending of the string.

Even later edit

Inspired from @tobias_k answer, here is how I would do it ^((.+)(?:.*))\1+\2?$. It looks for a string that has a prefix (it looks for the longest prefix it can find) that repeats at least twice and the ending must be the starting part of the prefix.

The first capturing group from the match will be the prefix that is repeating.

https://regex101.com/r/jQ3yY1/2

If you want the shortest prefix that repeats, you can use this pattern ^((.+?)(?:.*?))\1+\2?$.

Andrei Tătar
  • 7,872
  • 19
  • 37
4

You can use a regex like ^(.+)(.*)(\1\2)+\1?$.

  • ^...$ from start to end of string
  • (.+) part of period that is always repeated (e.g. a in ababa)
  • (.*) optional part of period that is repeated except at the end (e.g. b in ababa)
  • (\1\2)+ one or more repetitions of the entire period
  • \1? optional final repetition of first part of the period

In Python:

>>> p = r"^(.+)(.*)(\1\2)+\1?$"
>>> re.match(p, "abcab")
None
>>> re.match(p, "abcabca")
<_sre.SRE_Match at 0x7f5fde6e51f8>

Note that this does not match the empty string "" though, which could also be considered periodic. If the empty string should be matched, you will have to treat it separately, e.g. by simply appending |^$ at the end of the regex.

tobias_k
  • 81,265
  • 12
  • 120
  • 179