10

Is it possible to detect repeated number patterns with a regular expression?

So for example, if I had the following string "034503450345", would it be possible to match the repeated sequence 0345? I have a feeling this is beyond the scope of regex, but I thought I would ask here anyway to see if I have missed something.

Chas. Owens
  • 64,182
  • 22
  • 135
  • 226
Mark Withers
  • 1,462
  • 5
  • 16
  • 34
  • 1
    what language/platform are you using? – Philippe Leybaert Jun 03 '09 at 09:55
  • I'm using C#. All I needed was the regex though, so I've implemented RichieHindle's solution, and verified it against my test data already! I've also learned a lot from Peter Boughton's excellently commented regex. Thanks both of you! – Mark Withers Jun 03 '09 at 10:12
  • @MarkWithers I am dealing with same issue. Can you please be more concrete and tell me something more about your solution? Thank you – user2179427 Jan 18 '16 at 13:29
  • @user2179427 Not really, it was 6 years ago, so I no longer have access to the source code. However, all of the up-voted regex patterns in this question work, so it's just a case of looking up how to do regex in the language of your choice – Mark Withers Jan 25 '16 at 16:51

5 Answers5

22

This expression will match one or more repeating groups:

(.+)(?=\1+)


Here is the same expression broken down, (using commenting so it can still be used directly as a regex).

(?x)  # enable regex comment mode
(     # start capturing group
.+    # one or more of any character (excludes newlines by default)
)     # end capturing group
(?=   # begin lookahead
\1+   # match one or more of the first capturing group
)     # end lookahead


To match a specific pattern, change the .+ to that pattern, e.g. \d+ for one or more numbers, or \d{4,} to match 4 or more numbers.

To match a specific number of the pattern, change \1+, e.g to \1{4} for four repetitions.

To allow the repetition to not be next to each other, you can add .*? inside the lookahead.

Arnis Lapsa
  • 45,880
  • 29
  • 115
  • 195
Peter Boughton
  • 110,170
  • 32
  • 120
  • 176
11

Yes, you can - here's a Python test case

import re
print re.search(r"(\d+).*\1", "8034503450345").group(1)
# Prints 0345

The regular expression says "find some sequence of digits, then any amount of other stuff, then the same sequence again."

On a barely-related note, here's one of my favourite regular expressions - a prime number detector:

import re
for i in range(2, 100):
    if not re.search(r"^(xx+)\1+$", "x"*i):
        print i
RichieHindle
  • 272,464
  • 47
  • 358
  • 399
  • Your prime number detector finds 0 and 1 to be prime :-) – balpha Jun 03 '09 at 10:02
  • Any idea why the following example is *only* matching `8` and not `0345`? In [18]: foo = re.search(r"(\d+).*\1", "80345824103452420345") In [19]: foo.groups() Out[19]: ('8',) – Hank Gay Jun 03 '09 at 10:03
  • @balpha: Good pont - fixed. 8-) – RichieHindle Jun 03 '09 at 10:05
  • @Hank Gay: It'll match the first repeating group it finds, which in that case is "8". Use more "\d"s, or braces, to put a minimum length on it. – RichieHindle Jun 03 '09 at 10:07
  • @RichieHindle: Thanks. I had somehow convinced myself (at first) that it was matching the longest possible, not the first. – Hank Gay Jun 03 '09 at 10:36
9

Just to add a note to the (correct) answer from RichieHindle:

Note that while Python's regexp implementation (and many others, such as Perl's) can do this, this is no longer a regular expression in the narrow sense of the word.

Your example is not a regular language, hence cannot be handled by a pure regular expression. See e.g. the excellent Wikipedia article for details.

While this is mostly only of academic interest, there are some practical consequences. Real regular expressions can make much better guarantees for maximum runtimes than in this case. So you could get performance problems at some point.

Not to say that it's not a good solution, but you should realize that you're at the limit of what regular expressions (even in extended form) are capable of, and might want to consider other solutions in case of problems.

sleske
  • 81,358
  • 34
  • 189
  • 227
2

This is the C# code, that uses the backreference construct to find repeated digits. It will work with 034503450345, 123034503450345, 034503450345345, 232034503450345423. The regex is much easier and clearer to understand.

/// <summary>
/// Assigns repeated digits to repeatedDigits, if the digitSequence matches the pattern
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetRepeatedDigits(string digitSequence, out string repeatedDigits)
{
    repeatedDigits = null;

    string pattern = @"^\d*(?<repeat>\d+)\k<repeat>+\d*$";

    if (Regex.IsMatch(digitSequence, pattern))
    {
        Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
        repeatedDigits = r.Match(digitSequence).Result("${repeat}");
        return true;
    }
    else
        return false;
}
Rashmi Pandit
  • 23,230
  • 17
  • 71
  • 111
  • Very nice! I like the use of the named group. Production quality code, commented and ready to be copied. Thanks very much! – Mark Withers Jun 03 '09 at 10:29
0

Use regex repetition: bar{2,} looks for text with two or more bar: barbar barbarbar ...

user1920925
  • 672
  • 6
  • 5