0

In .NET, if I want to match a sequence of characters against a pattern that describes capturing groups that occur any number of times, I could write something as follows:

String input = "a, bc, def, hijk";
String pattern = "(?<x>[^,]*)(,\\s*(?<y>[^,]*))*";

Match m = Regex.Match(input, pattern);
Console.WriteLine(m.Groups["x"].Value);

//the group "y" occurs 0 or more times per match
foreach (Capture c in m.Groups["y"].Captures)
{
    Console.WriteLine(c.Value);
}

This code would print:

a
bc
def
hijk

That seems straightforward, but unfortunately the following Java code doesn't do what the .NET code does. (Which is expected, since java.util.regex doesn't seem to distinguish between groups and captures.)

String input = "a, bc, def, hijk";
Pattern pattern = Pattern.compile("(?<x>[^,]*)(,\\s*(?<y>[^,]*))*");

Matcher m = pattern.matcher(input);

while(m.find())
{
     System.out.println(m.group("x"));
     System.out.println(m.group("y"));
}

Prints:

a
hijk

null

Can someone please explain how to accomplish the same using Java, without having to re-write the regular expression or use external libraries?

ekad
  • 14,436
  • 26
  • 44
  • 46
J Smith
  • 2,375
  • 3
  • 18
  • 36
  • 1
    Which version of Java are you trying to use ? IIRC, You need Java7 to support Named capture groups. – Russ Clarke Mar 25 '13 at 01:04
  • Some of the answers in this question might help: http://stackoverflow.com/questions/415580/regex-named-groups-in-java – Russ Clarke Mar 25 '13 at 01:08
  • @RussC JSmith must be using Java 7 as the problem is not with named groups (he is actually able to get the named groups as you see in the print!). The problem lies within my answer =) – ddmps Mar 25 '13 at 01:11

2 Answers2

1

What you want is not possible in java. When the same group has been matched several times, only the last occurrence of that group is saved. For more info read the Pattern docs section Groups and capturing. In java the Matcher/Pattern is used to iterate through a String in "real-time".

Example with repetition:

String input = "a1b2c3";
Pattern pattern = Pattern.compile("(?<x>.\\d)*");
Matcher matcher = pattern.matcher(input);
while(matcher.find())
{
     System.out.println(matcher.group("x"));
}

Prints (null because the * matches the empty string too):

c3
null

Without:

String input = "a1b2c3";
Pattern pattern = Pattern.compile("(?<x>.\\d)");
Matcher matcher = pattern.matcher(input);
while(matcher.find())
{
     System.out.println(matcher.group("x"));
}

Prints:

a1
b2
c3
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
ddmps
  • 4,350
  • 1
  • 19
  • 34
0

You can use Pattern and Matcher classes in Java. It's slightly different. For example following code:

Pattern p = Pattern.compile("(el).*(wo)");
Matcher m = p.matcher("hello world");
while(m.find()) {
  for(int i=1; i<=m.groupCount(); ++i) System.out.println(m.group(i));
}

Will print two strings:

el
wo
gerrytan
  • 40,313
  • 9
  • 84
  • 99