0

In Java I am currently learning about the regular expressions syntax, but I don't really understand the RE patterns...

What I know is patterns have group length and for the string pattern below there is a length of 3.

import java.util.regex.*;

public class RE {
    public static void main(String[] args){
        String line = "Foo123";
        String pattern = "(.*)(\\d+)(.*)"; //RE Syntax I get stuck on.

        Pattern r = Pattern.compile(pattern);
        Matcher m = r.matcher(line);

        if (m.find()) {
            System.out.println(m.group(0));
            System.out.println(m.group(1));
            System.out.println(m.group(2));
            System.out.println(m.group(3));
        }
    }
}

I would be like it if someone would explain to me what this expression does what does more than one group do etc...

user3818650
  • 581
  • 1
  • 7
  • 19

4 Answers4

3

Group 0 contains the entire match and group 1, 2, 3 contains corresponding captured characters.

Input string: Foo123

Regex : (.*)(\d+)(.*)

The first .* in the first capturing group matches all the characters upto the last. Then it backtracks until it finds a digit. The reason for backtracking is in-order to find a match . And the corresponding digit would be captured by the group 2 (last digit). There is nothing left after all the digits , so you got an empty string inside group 3.

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1

Here is an explanation:

(       : start capture group 1
    .*  : 0 or more any character
)       : end group
(       : start capture group 2
    \\d+: 1 or more digit
)       : end group
(       : start capture group 3
    .*  : 0 or more any character
)       : end group

This regex matches for example:

  • 123
  • abc456kljh
  • :.?222
Toto
  • 89,455
  • 62
  • 89
  • 125
1
String line = "Foo123";
String pattern = "(.*)(\\d+)(.*)"; 
// (take any character - zero or more) // (digits one or more) // (take any character - zero or more)

So in the above case we have 3 groups captured. One with any character zero or more (greedy - can read at this link), then we have digits with \d pattern + corresponds to one or more.

Community
  • 1
  • 1
nitishagar
  • 9,038
  • 3
  • 28
  • 40
0

(.)(\\d+)(.)

You can hover over the regular expression you will get an explanation of that part.

1st Capturing group (.*)
  .* matches any character (except newline)
  Quantifier: * Between zero and unlimited times, as many times as possible
2nd Capturing group (\d+)
  \\ matches the character \ literally
  d+ matches the character d literally (case sensitive)
  Quantifier: + Between one and unlimited times, as many times as possible
3rd Capturing group (.*)
  .* matches any character (except newline)
  Quantifier: * Between zero and unlimited times, as many times as possible
Naveen Kumar Alone
  • 7,536
  • 5
  • 36
  • 57
  • \\ is a \ escaped in Java, so the second group is actually `\d+` – James Dec 10 '14 at 13:03
  • @James in Stackoverflow for bold fonts i made it as `**[(.)(\\d+)(.)]**` So it displays as `(.)(\d+)(.)`. Modified it to `**[(.)(\\\d+)(.)]**`. Its now `(.)(\\d+)(.)`. Thanks for your observation. Please review it. – Naveen Kumar Alone Dec 11 '14 at 07:51
  • Sorry, you changed the wrong bit. In your explanation, you've put that the 2nd capturing group is \\d+ but it should be \d+ (one or more digits). This is because a \\ in a Java string is an escaped \ – James Dec 12 '14 at 20:57