0

I want find char sequence like

  • AAA BBB or ZZZ in Name (eg RAAAJ , ABBBAAS)
  • ABABAB or CPCPCP in Name

is it possible to find through regex ?

I have tried this

\\b\\w*?(\\w{2})\\w*?\\1\\w*?\\b on <b>'tatarak'</b>

this found ta in word It should find only when ta is thrice or more

Lokesh Tiwari
  • 10,496
  • 3
  • 36
  • 45

2 Answers2

0

Try using groups and back-references within the same Pattern.

String[] namesWithRepeatedOneLetter = { "RAAAJ", "ABBBAAS" };
String[] namesWithRepeatedTwoLetters = { "ABABABC", "FOOBCBCD"};
//                            | This is a posix character class, basically your a-zA-Z 
//                            | range. Note the parenthesis which define it as a group.
//                            |           | This is a reference to previously declared
//                            |           | group (as group 1)
//                            |           |  | Greedy quantifier for more than 2 
//                            |           |  | letter repeat
Pattern p0 = Pattern.compile("(\\p{Alpha})\\1{2,}");
//                                       | Greedy quantifier for 2+ repeats (so 
//                                       | repetition is considered as such with 2 
//                                       | letter groups
Pattern p1 = Pattern.compile("(\\p{Alpha}{2,})\\1{2,}");
for (String n : namesWithRepeatedOneLetter) {
    Matcher m = p0.matcher(n);
    while (m.find()) {
        System.out.println(m.group());
    }
}
System.out.println();
for (String n: namesWithRepeatedTwoLetters) {
    Matcher m = p1.matcher(n);
    while (m.find()) {
        System.out.println(m.group());
    }
}

Output

AAA
BBB

ABABAB

Edit after comments

To reference Hindi characters, use a Unicode block or script reference instead of a class or Posix class.

For instance:

Pattern p0 = Pattern.compile("(\\p{IsDevanagari})\\1{2,}");

Finally, edited quantifier after back-reference (was greedy +, now greedy {2,}) so that only thrice repetitions are matched.

Mena
  • 47,782
  • 11
  • 87
  • 106
0

What about this? For tatarak loremipsrecdks RAAAJ , ABBBAAS the output is

tata
AAA
BBB
AA

The code

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class DublicatePattern {


    public static void main(String[] args) {
        String value = "tatarak loremipsrecdks RAAAJ , ABBBAAS";
        Pattern p = Pattern.compile("(\\w+)\\1+");
        Matcher m = p.matcher(value);
        while (m.find()) {
            System.out.println("Found: " + value.substring(m.start(), m.end()));
        }
    }
}
drkunibar
  • 1,327
  • 1
  • 7
  • 7