Java - Regular Expressions

Question

In Java I am currently learning about the regular expressions syntax, but I don't really understand the RE patterns...

What I know is patterns have group length and for the string pattern below there is a length of 3.

import java.util.regex.*;

public class RE {
    public static void main(String[] args){
        String line = "Foo123";
        String pattern = "(.*)(\\d+)(.*)"; //RE Syntax I get stuck on.

        Pattern r = Pattern.compile(pattern);
        Matcher m = r.matcher(line);

        if (m.find()) {
            System.out.println(m.group(0));
            System.out.println(m.group(1));
            System.out.println(m.group(2));
            System.out.println(m.group(3));
        }
    }
}

I would be like it if someone would explain to me what this expression does what does more than one group do etc...

Read about [capturing groups](http://www.regular-expressions.info/brackets.html). — Maroun, Dec 10 '14 at 12:51
And here: https://docs.oracle.com/javase/tutorial/essential/regex/groups.html — Maroun, Dec 10 '14 at 12:53
So what does + do? It says that "Matches 1 or more of the previous thing" but when I take it out, it makes no difference? — user3818650, Dec 12 '14 at 11:46
Example: `\\d+@` matches 123@, `\\d@` matches 5@ but not more than one digit followed by @. — Maroun, Dec 12 '14 at 12:30
And also, if pattern was \\bcat\\b and line was cat cat cattie cat, why does the 2nd part \\b allow only full matches of cat? — user3818650, Dec 12 '14 at 13:46

score 3 · Answer 1 · answered Dec 10 '14 at 12:54

Group 0 contains the entire match and group 1, 2, 3 contains corresponding captured characters.

Input string: Foo123

Regex : (.*)(\d+)(.*)

The first .* in the first capturing group matches all the characters upto the last. Then it backtracks until it finds a digit. The reason for backtracking is in-order to find a match . And the corresponding digit would be captured by the group 2 (last digit). There is nothing left after all the digits , so you got an empty string inside group 3.

DEMO

Good explanation about the internals. – Maroun Dec 10 '14 at 12:55 — Maroun, Dec 10 '14 at 12:55

score 1 · Answer 2 · answered Dec 10 '14 at 12:53

Here is an explanation:

(       : start capture group 1
    .*  : 0 or more any character
)       : end group
(       : start capture group 2
    \\d+: 1 or more digit
)       : end group
(       : start capture group 3
    .*  : 0 or more any character
)       : end group

This regex matches for example:

123
abc456kljh
:.?222

score 1 · Answer 3 · edited May 23 '17 at 12:21

1

String line = "Foo123";
String pattern = "(.*)(\\d+)(.*)"; 
// (take any character - zero or more) // (digits one or more) // (take any character - zero or more)

So in the above case we have 3 groups captured. One with any character zero or more (greedy - can read at this link), then we have digits with \d pattern + corresponds to one or more.

edited May 23 '17 at 12:21

Community

1
1

answered Dec 10 '14 at 12:54

nitishagar

9,038
3
28
40

Naveen Kumar Alone · Answer 4 · 2014-12-17T06:42:28.303

0

(.)(\\d+)(.)

You can hover over the regular expression you will get an explanation of that part.

1st Capturing group (.*)
  .* matches any character (except newline)
  Quantifier: * Between zero and unlimited times, as many times as possible
2nd Capturing group (\d+)
  \\ matches the character \ literally
  d+ matches the character d literally (case sensitive)
  Quantifier: + Between one and unlimited times, as many times as possible
3rd Capturing group (.*)
  .* matches any character (except newline)
  Quantifier: * Between zero and unlimited times, as many times as possible

edited Dec 17 '14 at 06:42

answered Dec 10 '14 at 12:52

Naveen Kumar Alone

7,536
5
36
57

\\ is a \ escaped in Java, so the second group is actually `\d+` – James Dec 10 '14 at 13:03
@James in Stackoverflow for bold fonts i made it as `**[(.)(\\d+)(.)]**` So it displays as `(.)(\d+)(.)`. Modified it to `**[(.)(\\\d+)(.)]**`. Its now `(.)(\\d+)(.)`. Thanks for your observation. Please review it. – Naveen Kumar Alone Dec 11 '14 at 07:51
Sorry, you changed the wrong bit. In your explanation, you've put that the 2nd capturing group is \\d+ but it should be \d+ (one or more digits). This is because a \\ in a Java string is an escaped \ – James Dec 12 '14 at 20:57

Java - Regular Expressions

4 Answers4