I'm trying to capture nested optional groups in Java but it's not working out.
I'm trying to capture a keyword followed by an interval, where a keyword is anything for now, and an interval is just two dates. The interval may be optional, and the two dates may be optional as well. So, the following are valid matches.
- word
- word [01/01/1900, ]
- word [, 01/01/2000]
- word [01/01/1900, 01/01/2000]
I want to capture the keyword and both the dates even if they are null.
This is the Java MWE I've came up with.
public class Parser {
public static void main(String[] args) {
Parser parser = new Parser();
String s = "word [01/01/1900, 01/01/2000]";
parser.parse(s);
}
public void parse(String s) {
String date = "\\d{2}/\\d{2}/\\d{4}";
String interval = "\\[("+date+")?, ("+date+")?\\]";
String keyword = "(.+)( "+interval+")?";
Pattern p = Pattern.compile(keyword);
Matcher m = p.matcher(s);
if (m.matches()) {
for (int i = 0; i <= m.groupCount(); ++i) {
System.out.println(i + ": " + m.group(i));
}
}
}
}
And this is the output
0: word [01/01/1900, 01/01/2000]
1: word [01/01/1900, 01/01/2000]
2: null
3: null
4: null
If interval isn't optional, then it works.
String keyword = "(.+)( "+interval+")";
0: word [01/01/1900, 01/01/2000]
1: word
2: [01/01/1900, 01/01/2000]
3: 01/01/1900
4: 01/01/2000
If interval is a non-matching group (but still optional), then it doesn't work.
String keyword = "(.+)(?: "+interval+")?";
0: word [01/01/1900, 01/01/2000]
1: word [01/01/1900, 01/01/2000]
2: null
3: null
What do I need to do to retrieve back both dates? Thank You.
Edit: Part 2.
Suppose now I watch to match repeated keywords. i.e. the regex, keyword(, keyword)*
. I tried this out, but only the first and the last instance is captured.
For simplicity, suppose I want to match the following a, b, c, d
with the regex ([a-z])(?:, ([a-z]))*
However, I can only retrieve back the first and last group.
0: a, b, c, d
1: a
2: d
Why is this so?
Just found out that this cannot be done. Capture group multiple times