-1

I'm looking for the regex expression that will detect repeating symbols in a String. And currently I didn't found solution that fits all my requirements.

Requirements are pretty simple:

  • detect any repeating symbol in a String;
  • to be able to setup repeating count (eg. more than twice)

Examples of required detection (of symbol 'a', more than 2 times, true if detects, false otherwise)

"Abcdefg" - false

"AbcdaBCD" - false

"abcd_ab_ab" - true (symbol 'a' used three times)

"aabbaabb" - true (symbols 'a' used four times)

Since I'm not a pro in regex and usage of them - code snippet and explanation would be appreciated!

Thanks!

Demigod
  • 5,073
  • 3
  • 31
  • 49
  • May be a dupe of http://stackoverflow.com/questions/7378451/java-regex-match-count. Or http://stackoverflow.com/questions/275944/java-how-do-i-count-the-number-of-occurrences-of-a-char-in-a-string. – Wiktor Stribiżew Sep 28 '16 at 10:17
  • @iDemigod could you clarify what you mean by "to be able to setup repeating count (eg. more than twice)" - are you saying you want to specify that it is found *at least* 3 times, for instance? – Andy Turner Sep 28 '16 at 10:28
  • @AndyTurner , exactly! – Demigod Sep 28 '16 at 10:29

3 Answers3

6

I think that

(.).*\1

would work:

  • (.) match a single character and capture
  • .* match any intervening characters
  • \1 match the captured group again.

(You'd need to compile with the DOTALL flag, or replace . with [\s\S] or similar if the string contains characters not ordinarily matched by .)

and if you want to require that it is found at least 3 times, just change the quantifier of the second two bullets:

(.)(.*\1){2}

etc.

This is going to be pretty inefficient, though, because it's going to have to do the "search for the next matching character" between every character in the string and the end of the string, making it at least quadratic.

You might be as well off not using regular expressions, e.g.

char[] cs = str.toCharArray();
Arrays.sort(cs);
int n = numOccurrencesRequired - 1;
for (int i = n; i < cs.length; ++i) {
  boolean allSame = true;
  for (int j = 1; j <= n && allSame; ++j) {
    allSame = cs[i] == cs[i - j];
  }
  if (allSame) return true;
}
return false;

This sorts all of the same characters together, allowing you just to pass over the string once looking for adjacent equal characters.

Note that this doesn't quite work for any symbol: it will split up multi-char codepoints like . You can adapt the code above to work with codepoints, rather than chars.

Andy Turner
  • 137,514
  • 11
  • 162
  • 243
  • Andy, could you add the documentation which says, that construction is legal? – xenteros Sep 28 '16 at 10:23
  • The current solution here does not account for the 2nd OP's requirement. – Wiktor Stribiżew Sep 28 '16 at 10:23
  • @xenteros uh, no, other than [here](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html)... but [it is legal](http://ideone.com/uuxbSb). – Andy Turner Sep 28 '16 at 10:25
  • @AndyTurner I hate the java regex documentation – xenteros Sep 28 '16 at 10:26
  • 2
    @xenteros take a look at "back references" at https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html - "`\n` - Whatever the nth capturing group matched" – Pshemo Sep 28 '16 at 10:29
  • @AndyTurner this expression (.)(.*\1){2} didnot found three 'a' in "abcabcaef" – Demigod Sep 28 '16 at 10:33
  • 1
    @iDemigod How are you using this regex? It will fail if you are using it with `matches` method since it tests if regex marches *entire* string, not if it can match just part of it. In that case you would either need to add `.*` at start and end of regex to let it match remaining parts (assuming that string doesn't contain line separators since by default `.` can't match these), or don't use `String#matches` but `Matcher#find`. – Pshemo Sep 28 '16 at 10:35
  • @AndyTurner I was using matcher.find() – Demigod Sep 28 '16 at 10:39
  • @AndyTurner, thanks for the example! It works. In my code was mistake. – Demigod Sep 28 '16 at 11:00
1

Try this regex: (.)(?:.*\1)

It basically matches any character (.) is followed by anything .* and itself \1. If you want to check for 2 or more repeats only add {n,} at the end with n being the number of repeats you want to check for.

Thomas
  • 87,414
  • 12
  • 119
  • 157
0

Yea, such regex exists but just because the set of characters is finite.

regex: .*(a.*a|b.*b|c.*c|...|y.*y|z.*z).*

It makes no sense. Use another approach:

String string = "something";
int[] count = new int[256];
for (int i = 0; i < string.length; i++) {
    int temp = int(string.charAt(i));
    count[temp]++;
}

Now you have all characters counted and you can use them as you wish.

xenteros
  • 15,586
  • 12
  • 56
  • 91