Regular expression to match any character being repeated more than 10 times

Question

I'm looking for a simple regular expression to match the same character being repeated more than 10 or so times. So for example, if I have a document littered with horizontal lines:

=================================================

It will match the line of = characters because it is repeated more than 10 times. Note that I'd like this to work for any character.

the title of this asnwer is misleading, you should have said 'Regular expression to match any character repeated more than 10 times' — dalloliogm, Nov 02 '09 at 11:59

score 230 · Accepted Answer · edited Mar 21 '23 at 20:18

The regex you need is /(.)\1{9,}/.

Test:

#!perl
use warnings;
use strict;
my $regex = qr/(.)\1{9,}/;
print "NO" if "abcdefghijklmno" =~ $regex;
print "YES" if "------------------------" =~ $regex;
print "YES" if "========================" =~ $regex;

Here the \1 is called a backreference. It references what is captured by the dot . between the brackets (.) and then the {9,} asks for nine or more of the same character. Thus this matches ten or more of any single character.

Although the above test script is in Perl, this is very standard regex syntax and should work in any language. In some variants you might need to use more backslashes, e.g. Emacs would make you write \(.\)\1\{9,\} here.

If a whole string should consist of 10 or more identical characters, add anchors around the pattern:

my $regex = qr/^(.)\1{9,}$/;

Michał Niklas · Answer 2 · 2009-11-02T11:56:12.670

46

In Python you can use (.)\1{9,}

(.) makes group from one char (any char)
\1{9,} matches nine or more characters from 1st group

example:

txt = """1. aaaaaaaaaaaaaaa
2. bb
3. cccccccccccccccccccc
4. dd
5. eeeeeeeeeeee"""
rx = re.compile(r'(.)\1{9,}')
lines = txt.split('\n')
for line in lines:
    rxx = rx.search(line)
    if rxx:
        print line

Output:

1. aaaaaaaaaaaaaaa
3. cccccccccccccccccccc
5. eeeeeeeeeeee

edited Nov 02 '09 at 11:56

answered Nov 02 '09 at 11:35

Michał Niklas

53,067
18
70
114

if re.search(line): print line (the assignemnt to the rxx variable is not necessary) – dalloliogm Nov 02 '09 at 11:40
2

You are right in this simple context. Using variable rxx I can do something like rxx.group(1), rxx.start(1) etc. – Michał Niklas Nov 02 '09 at 11:52

score 7 · Answer 3 · edited Nov 02 '09 at 11:35

7

. matches any character. Used in conjunction with the curly braces already mentioned:

$: cat > test
========
============================
oo
ooooooooooooooooooooooo


$: grep -E '(.)\1{10}' test
============================
ooooooooooooooooooooooo

edited Nov 02 '09 at 11:35

SilentGhost

307,395
66
306
293

answered Nov 02 '09 at 11:35

jeekl

372
3
9

Hi Jeek and @SilentGhost. The two commands `grep -E '([=o])\1{10}' test` and `grep -E '([=o]){10}' test` works fine with your example (note the lack of `\1` in the second command). But the command `grep -E '([=o])\1{10}' <<< '==o==o==o==o==o==o===o==o==='` does not match the line! However the command without `\1` matches the line: `grep -E '([=o]){10}' <<< '==o==o==o==o==o==o===o==o==='`. Please could you explain? Cheers ;) – oHo Nov 21 '13 at 18:00

score 3 · Answer 4 · answered Nov 02 '09 at 11:25

3

={10,}

matches = that is repeated 10 or more times.

answered Nov 02 '09 at 11:25

SilentGhost

307,395
66
306
293

1

sure that this does not take 10 or more arbitrary characters? – Etan Nov 02 '09 at 11:26
`perl -e 'print "NO" if "abcdefghijklmno" =~ /.{10,}/;'` – Nov 02 '09 at 11:27
it was wrong, but it has been edited (to match my answer which got some downvotes, good) – dalloliogm Nov 02 '09 at 11:31
4

*Gee, didn't know I had to say explicitly that you can replace the character with anything you want.* – SilentGhost Nov 02 '09 at 11:38

score 2 · Answer 5 · answered Nov 02 '09 at 11:25

2

use the {10,} operator:

$: cat > testre
============================
==
==============

$: grep -E '={10,}' testre
============================
==============

answered Nov 02 '09 at 11:25

dalloliogm

8,718
6
45
55

score 1 · Answer 6 · answered Jun 13 '13 at 17:27

You can also use PowerShell to quickly replace words or character reptitions. PowerShell is for Windows. Current version is 3.0.

$oldfile = "$env:windir\WindowsUpdate.log"

$newfile = "$env:temp\newfile.txt"
$text = (Get-Content -Path $oldfile -ReadCount 0) -join "`n"

$text -replace '/(.)\1{9,}/', ' ' | Set-Content -Path $newfile

score 1 · Answer 7 · answered Jun 07 '18 at 13:31

PHP's preg_replace example:

$str = "motttherbb fffaaattther";
$str = preg_replace("/([a-z])\\1/", "", $str);
echo $str;

Here [a-z] hits the character, () then allows it to be used with \\1 backreference which tries to match another same character (note this is targetting 2 consecutive characters already), thus:

mother father

If you did:

$str = preg_replace("/([a-z])\\1{2}/", "", $str);

that would be erasing 3 consecutive repeated characters, outputting:

moherbb her

score 0 · Answer 8 · answered Jun 08 '20 at 22:12

0

A slightly more generic powershell example. In powershell 7, the match is highlighted including the last space (can you highlight in stack?).

'a b c d e f ' | select-string '([a-f] ){6,}'

a b c d e f

answered Jun 08 '20 at 22:12

js2010

23,033
6
64
66

Regular expression to match any character being repeated more than 10 times

8 Answers8

Linked

Related