2

Consider that you have the following string:

id: 1 name: Joe age: 27 id: 2 name: Mary age:22

And you want to extract every token after "age:" BUT NOT the string "age:" itself.

So I want my Matcher's group() to return 27 and 22 and not "age: 27" and "age:22"

Is there a way to specify this instruction in the Java Regex syntax, which seems quite different than that in Perl, where I learned my Regex basics?

This is my code:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegExTest 
{
    public static void main(String[] args) 
    {
        Pattern namePtrn = Pattern.compile("age: *\\w*");

        String data = "id: 1 name: Joe age:27 id: 2 name: Mary age:22";

        Matcher nameMtchr = namePtrn.matcher(data);

        while(nameMtchr.find())
        {
            String find = nameMtchr.group();

            System.out.println ("\t" + find);
        }
    }
}

In Perl I can use {} to limit the portion of the pattern that I want extracted

while($text =~ m/(age:{\w+})/g)
{
      my $find = $1;

      if($find)
      {
          print "\nFIND = ".$find;
      }
}

would return

FIND = 27
FIND = 22

and if I put {} around age like

while($text =~ m/({age:\w+})/g)

it would return

FIND = age: 27
FIND = age:22

So I am looking for something like Perl's {} but in Java.

amphibient
  • 29,770
  • 54
  • 146
  • 240
  • 1
    Standard *capture groups* (keyword) is all you get; compare with: `m/age:(\w+)/g` .. –  Oct 08 '12 at 19:01
  • 1
    (Please read the fine manual for how to access *capture groups* - keyword! - in Java. Just as with Perl, there is a special way to access a specific group: e.g. `$1` vs. `$&`.) –  Oct 08 '12 at 19:07
  • What!? Perl uses curly braces for capture groups? – Richard JP Le Guen Oct 08 '12 at 19:10
  • 1
    [Perl uses curly braces as quantifiers](http://perldoc.perl.org/perlre.html#Quantifiers); not for capture groups. – Richard JP Le Guen Oct 08 '12 at 19:13
  • you may be right but the above worked for me when i tested it – amphibient Oct 08 '12 at 19:15
  • i do not understand why haters would downvote a question like this that is well laid out with source code examples and is asking how to accomplish something in plain english – amphibient Oct 08 '12 at 19:16
  • 1
    Maybe it's because you didn't properly read more than half of the answers. – Wug Oct 08 '12 at 19:20
  • maybe they weren't as explicit and direct as the one i marked accepted below. check it out – amphibient Oct 08 '12 at 19:23

3 Answers3

7

If you use Matcher.group(1) instead of Matcher.group() you can capture the pattern minus 'age:':

String data = "id: 1 name: Joe age:27 id: 2 name: Mary age:22";
Pattern namePtrn = Pattern.compile("age:(\\w+)");
Matcher nameMtchr = namePtrn.matcher(data);

while (nameMtchr.find()) {
   String find = nameMtchr.group(1);
   System.out.println("\t" + find);
}
Reimeus
  • 158,255
  • 15
  • 216
  • 276
  • Just as an additional note \w is for matching any word character and \ needs to be escaped so an additional \ makes it \\w and + indicates 1 or more occurrences. The curved brackets indicate the contents within to be a group. – Ankit Jain Oct 15 '20 at 04:58
1

Try:

age:\s*(\d+)

Matches "age:" followed by any amount of whitespace, followed by one or more digits. The digits (the numeric value) are captured in the first group.

If you want to support negative ages (i.e. -1 for "age unknown" or something) you can use:

age:\s*(-?\d+)

Which will match "age:" followed by any amount of whitespace, followed by either zero or one minus signs followed by one or more digits. The digits and the optional minus sign (the numeric value) are captured in the first group.

If you aren't sure how to get capture groups to work, consult this question which has a few examples.

Community
  • 1
  • 1
Wug
  • 12,956
  • 4
  • 34
  • 54
  • no, that did not do. BTW, you need to use double \, otherwise Java won't compile – amphibient Oct 08 '12 at 19:03
  • 2
    @foampile The example posted shows the regular expression *value*, not the string literal (there are no "s). Also "did not do" is a near-usless statement. A better one might be: "I still don't know how to access the capture group." –  Oct 08 '12 at 19:04
  • "did not do" means i put it in my code and it did not return what i was asking for in the OP – amphibient Oct 08 '12 at 19:06
  • @foampile: You have to get the groups, not the entire matched pattern. – Wug Oct 08 '12 at 19:08
0

Use unescaped parenthesis:

Pattern namePtrn = Pattern.compile("age: *(\\w*)");

This will put it in the first capture group of the Matcher.

Wug
  • 12,956
  • 4
  • 34
  • 54
Cory Kendall
  • 7,195
  • 8
  • 37
  • 64
  • that did not do either. output "age:" before the value – amphibient Oct 08 '12 at 19:05
  • @foampile ..because the capture group is still not being used - presumable the entire *entire match* capture is being used still, not a specific capture group. Granted neither post shows how to get to a *specific* capture group. –  Oct 08 '12 at 19:06
  • 1
    `name:` should be `age:` in the regex? – Windle Oct 08 '12 at 19:07
  • the OP stipulates clearly what the output should be: 27 and 22, not age: 27 and age:22 – amphibient Oct 08 '12 at 19:07
  • @foampile Again, *the regex is valid* and shows the use of a capture group (even with the typo), the *usage* of it is incorrect. –  Oct 08 '12 at 19:09