2

Let's say I have a string:

String sentence = "My nieces are Cara:8 Sarah:9 Tara:10";

And I would like to find all their respective names and ages with the following pattern matcher:

String regex = "My\\s+nieces\\s+are((\\s+(\\S+):(\\d+))*)";
Pattern pattern = Pattern.compile;
Matcher matcher = pattern.matcher(sentence);

I understand something like

matcher.find(0); // resets "pointer"
String niece = matcher.group(2);
String nieceName = matcher.group(3);
String nieceAge = matcher.group(4);

would give me my last niece (" Tara:10", "Tara", "10",).

How would I collect all of my nieces instead of only the last, using only one regex/pattern?

I would like to avoid using split string.

  • 1
    https://stackoverflow.com/a/6939587/1553851 – shmosel Sep 07 '22 at 00:28
  • Does this answer your question? [Java regex: Repeating capturing groups](https://stackoverflow.com/questions/6939526/java-regex-repeating-capturing-groups) – Tim Moore Sep 07 '22 at 00:55
  • 2
    Another idea is to [use the `\G` anchor](https://regex101.com/r/i0vPYF/1) to *continue where the previous match ended*, [see this demo at tio.run](https://tio.run/##fZAxT8MwEIX3/Iqjk60KizJBqqpCDEyVkMqGGUxypC6xE9mXQgX97eGSuBILDPazz989@3lvDuZyX773vXVtEwj2XFAd2VoFrPBTbQwVOwzL7I/zR0OEwS@zrKhNjEAYKfvKANrutbYFRDLEcmhsCc5YD2JLwfrq@QVMqKJkcqABpjJE9IS@QFjBbHMEb7HAyCjCvQkmv4Etyy6/hadhu7iaLX93j28aWsU61/pBrC@0vpPfm6PWcT55DSu2k4MKrbdzmbOUc5mcUiBok67OFVU0rrU1ivESOdHjlP4IHMOpTbmpJs55mB/Zjx1bgHDqzfpSSJnSc4JjJHSq6Ui1nIVqz1AVmq4VC5lu@5e6PlOnbBinvv8B) – bobble bubble Sep 07 '22 at 01:34
  • @bobblebubble Very cool. You should post that as an answer. – shmosel Sep 07 '22 at 01:38

2 Answers2

2

You can't iterate over repeating groups, but you can match each group individually, calling find() in a loop to get the details of each one. If they need to be back-to-back, you can iteratively bound your matcher to the last index, like this:

Matcher matcher = Pattern.compile("My\\s+nieces\\s+are").matcher(sentence);
if (matcher.find()) {
    int boundary = matcher.end();
    
    matcher = Pattern.compile("^\\s+(\\S+):(\\d+)").matcher(sentence);
    while (matcher.region(boundary, sentence.length()).find()) {
        System.out.println(matcher.group());
        System.out.println(matcher.group(1));
        System.out.println(matcher.group(2));
        
        boundary = matcher.end();
    }
}
shmosel
  • 49,289
  • 6
  • 73
  • 138
2

Another idea is to use the \G anchor that matches where the previous match ended (or at start).

String regex = "(?:\\G(?!\\A)|My\\s+nieces\\s+are)\\s+(\\S+):(\\d+)";
  • If My\s+nieces\s+are matches
  • \G will chain matches from there
  • (?!\A) neg. lookahead prevents \G from matching at \A start
  • \s+(\S+):(\d+) using two capturing groups for extraction

See this demo at regex101 or a Java demo at tio.run

Matcher m = Pattern.compile(regex).matcher(sentence);

while (m.find()) {
  System.out.println(m.group(1));
  System.out.println(m.group(2));
}
bobble bubble
  • 16,888
  • 3
  • 27
  • 46