.NET offers a Capture collection in its RegularExpression implementation so you can get all instances of a given repeating group rather than just the last instance of it. That's great, but I have a repeating group with subgroups and I'm trying to get at the subgroups as they are related under the group, and can't find a way. Any suggestions?
I've looked at number of other questions, e.g.:
- Select multiple elements in a regular expression
- Regex .NET attached named group
- How can I get the Regex Groups for a given Capture?
but I have found no applicable answer either affirmative ("Yep, here's how") or negative ("Nope, can't be done.").
For a contrived example say I have an input string:
abc d x 1 2 x 3 x 5 6 e fgh
where the "abc" and "fgh" represent text that I want to ignore in the larger document, "d" and "e" wrap the area of interest, and within that area of interest, "x n [n]" can repeat any number of times. It's those number pairs in the "x" areas that I'm interested in.
So I'm parsing it using this regular expression pattern:
.*d (?<x>x ((?<fir>\d+) )?((?<sec>\d+) )?)*?e.*
which will find exactly one match in the document, but capture the "x" group many times. Here are the three pairs I would want to extract in this example:
- 1, 2
- 3
- 5, 6
but how can I get them? I could do the following (in C#):
using System;
using System.Text;
using System.Text.RegularExpressions;
string input = "abc d x 1 2 x 3 x 5 6 e fgh";
string pattern = @".*d (?<x>x ((?<fir>\d+) )?((?<sec>\d+) )?)*?e.*";
foreach (var x in Regex.Match(input, pattern).Groups["x"].Captures) {
MessageBox.Show(x.ToString());
}
and since I'm referencing group "x" I get these strings:
- x 1 2
- x 3
- x 5 6
But that doesn't get me at the numbers themselves. So I could do "fir" and "sec" independently instead of just "x":
using System;
using System.Text;
using System.Text.RegularExpressions;
string input = "abc d x 1 2 x 3 x 5 6 e fgh";
string pattern = @".*d (?<x>x ((?<fir>\d+) )?((?<sec>\d+) )?)*?e.*";
Match m = Regex.Match(input, pattern);
foreach (var f in m.Groups["fir"].Captures) {
MessageBox.Show(f.ToString());
}
foreach (var s in m.Groups["sec"].Captures) {
MessageBox.Show(s.ToString());
}
to get:
- 1
- 3
- 5
- 2
- 6
but then I have no way of knowing that it's the second pair that's missing the "4", and not one of the other pairs.
So what to do? I know I could easily parse this out in C# or even with a second regex test on the "x" group, but since the first RegEx run has already done all the work and the results ARE known, it seems there ought to be a way to manipulate the Match object to get what I need out of it.
And remember, this is a contrived example, the real world case is somewhat more complex so just throwing extra C# code at it would be a pain. But if the existing .NET objects can't do it, then I just need to know that and I'll continue on my way.
Thoughts?