I have a series of grouped values that follow a specific format and would like to use a single expression to capture them into groups.
For example, I have -group1 -group2 -group3
and am trying to use something similar to (-[\s\S]{1,}?)
This is basically allowing me to capture the entire string into a single group but I'd like to be able to backreference each of the values separately. I figured the ?
would force it to be non-greedy and, therefore, split the pattern match into three separate groups (for example).
For now I am simply repeating the reference (-[\s\S]*?)
but it seems there should be a more elegant expression.
Thanks!

- 1,614
- 30
- 62
-
This is somewhat vague. Can you show some sample text and the expected grouping result, including what you intend to backreference? – Ahmad Mageed Jun 15 '12 at 13:31
-
By the way, `[\s\S]` says "Match any space or non-space character". Think about that. ;) – qJake Jun 15 '12 at 13:35
3 Answers
You are in luck because C# is one of the few languages (if not the only one) that supports subexpression captures
https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.capture(v=vs.110)
The .NET API can be looked at as follows
Matches
Groups (most regex engines stop here)
Captures (unique for .NET)
It's not clear from your question what you want to match exactly but this should get you started. Ask again if you are stuck.
string input = "-group1 -group2 ";
string pattern = @"(-\S*\W){2}";
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine("Match: {0}", match.Value);
for (int groupCtr = 0; groupCtr < match.Groups.Count; groupCtr++)
{
Group group = match.Groups[groupCtr];
Console.WriteLine(" Group {0}: {1}", groupCtr, group.Value);
for (int captureCtr = 0; captureCtr < group.Captures.Count; captureCtr++)
Console.WriteLine(" Capture {0}: {1}", captureCtr,
group.Captures[captureCtr].Value);
}
}
This ouputs
Match: -group1 -group2
Group 0: -group1 -group2
Capture 0: -group1 -group2
Group 1: -group2
Capture 0: -group1
Capture 1: -group2
As you can see (Group 1, Capture 0) and (Group 1, Capture 1) offer the individual captures of a group (and not the last as in most languages)
This address I think of what you describe as "to be able to backreference each of the values separately"
(You use the term backreference but I don't think you are aiming for a replacement pattern right?)
-
1+1. I believe this is what he was asking for as well. As an aside, PHP offers this with its ``preg_match_all(...)`` function; using the ``PREG_SET_ORDER`` flag, it returns a multi-dimensional array, the first array containing the first set, the second array containing the second set, etc. (Other flags allow different representations.) My guess is that if .NET and PHP have implemented it, other languages have too. – Andrew Cheong Jun 15 '12 at 13:56
-
@acheong87 Good comment, I'll add it to the answer to not offend other languages :) I have it from the regexp expert Jan Goyvaerts that .NET was unique on this aspect but I am not active enough on other platforms to back this up. – buckley Jun 15 '12 at 14:10
-
3@acheong87, that's not the same thing. `preg_match_all()` applies the regex repeatedly and returns the collected results, similar to .NET's `Matches()` method. The OP wants to perform *one* match that will consume the whole string, then break out the individual captures. .NET provides the `CaptureCollection` for that purpose, but PHP has no equivalent. – Alan Moore Jun 15 '12 at 15:52
-
@AlanMoore, ah, I see; my apologies to @buckley for the misleading information, I definitely misunderstood. So if I understand correctly now, .NET's ``CaptureCollection`` can actually split up something like ``/^hello, (\w+\s*)+$/`` matched against "hello, john doe" as "john" and "doe", whereas PHP (and other languages) cannot. – Andrew Cheong Jun 15 '12 at 16:00
-
-
@buckley thank you, this is what I was looking for. I was hoping that I could perform this in a single (non-looping) construct and then use the backreferences (\1 \2) to pull out the desired groups, but I wasn't aware of the Captures that .Net provides so perhaps I could go that route. Either way, I learn something new, which is always good. ;) – McArthey Jun 21 '12 at 14:17
-
@buckley -- this doesn't seem to work unless you know the explicit number of captures, e.g. `{2}` in this case. What if you want to use `{1,}` or `{2,5}` or `+`? – rory.ap May 23 '19 at 15:22
-
@rory.ap I believe it should still work. Can you make a .NET Fiddle that demonstrates your question? https://dotnetfiddle.net/ – buckley May 24 '19 at 07:35
With .NET regex (and almost only .NET) you can use:
(?:(-\S+)\s*)+
Group 1 will contain a list of all matched substrings.
Or maybe just using Matches
is sufficient in your case:
var re = new Regex(@"-\S+");
var matches = re.Matches(str);

- 33,241
- 9
- 83
- 121
Try this:
(-.+?)(\s|$)
Your first capture group will have what you want (-group1
, -group2
, etc).
If you want more control over what to allow after the -
, change .+?
to, for example, [a-zA-Z0-9]+?
to only match alphanumeric characters.

- 16,821
- 17
- 83
- 135