90

I've inherited a code block that contains the following regex and I'm trying to understand how it's getting its results.

var pattern = @"\[(.*?)\]";
var matches = Regex.Matches(user, pattern);
if (matches.Count > 0 && matches[0].Groups.Count > 1)
    ...

For the input user == "Josh Smith [jsmith]":

matches.Count == 1
matches[0].Value == "[jsmith]"

... which I understand. But then:

matches[0].Groups.Count == 2
matches[0].Groups[0].Value == "[jsmith]"
matches[0].Groups[1].Value == "jsmith" <=== how?

Looking at this question from what I understand the Groups collection stores the entire match as well as the previous match. But, doesn't the regexp above match only for [open square bracket] [text] [close square bracket] so why would "jsmith" match?

Also, is it always the case the the groups collection will store exactly 2 groups: the entire match and the last match?

Community
  • 1
  • 1
Lester
  • 4,243
  • 2
  • 27
  • 31

5 Answers5

194
  • match.Groups[0] is always the same as match.Value, which is the entire match.
  • match.Groups[1] is the first capturing group in your regular expression.

Consider this example:

var pattern = @"\[(.*?)\](.*)";
var match = Regex.Match("ignored [john] John Johnson", pattern);

In this case,

  • match.Value is "[john] John Johnson"
  • match.Groups[0] is always the same as match.Value, "[john] John Johnson".
  • match.Groups[1] is the group of captures from the (.*?).
  • match.Groups[2] is the group of captures from the (.*).
  • match.Groups[1].Captures is yet another dimension.

Consider another example:

var pattern = @"(\[.*?\])+";
var match = Regex.Match("[john][johnny]", pattern);

Note that we are looking for one or more bracketed names in a row. You need to be able to get each name separately. Enter Captures!

  • match.Groups[0] is always the same as match.Value, "[john][johnny]".
  • match.Groups[1] is the group of captures from the (\[.*?\])+. The same as match.Value in this case.
  • match.Groups[1].Captures[0] is the same as match.Groups[1].Value
  • match.Groups[1].Captures[1] is [john]
  • match.Groups[1].Captures[2] is [johnny]
Joe
  • 14,039
  • 2
  • 39
  • 49
agent-j
  • 27,335
  • 5
  • 52
  • 79
  • 8
    This answer is the one that helped me put it together (looks like from votes, others felt the same), and seems to more correctly address the question than the accepted answer. – Philip Tenn Mar 29 '14 at 19:21
35

The ( ) acts as a capture group. So the matches array has all of matches that C# finds in your string and the sub array has the values of the capture groups inside of those matches. If you didn't want that extra level of capture jut remove the ( ).

zellio
  • 31,308
  • 1
  • 42
  • 61
  • 7
    And if you do not want to capture the group use `non-capturing groups`. `(?:regex)`. Regex reference: http://www.regular-expressions.info/refadv.html – BrunoLM Jun 16 '11 at 17:19
  • 1
    Correct BrunoLM :: If you need the logical group but don't want it to be captured. – zellio Jun 16 '11 at 17:20
4

Groups[0] is your entire input string.

Groups[1] is your group captured by parentheses (.*?). You can configure Regex to capture Explicit groups only (there is an option for that when you create a regex), or use (?:.*?) to create a non-capturing group.

Jules
  • 1,677
  • 1
  • 19
  • 25
THX-1138
  • 21,316
  • 26
  • 96
  • 160
2

The parenthesis is identifying a group as well, so match 1 is the entire match, and match 2 are the contents of what was found between the square brackets.

The Evil Greebo
  • 7,013
  • 3
  • 28
  • 55
2

How? The answer is here

(.*?)

That is a subgroup of @"[(.*?)];

Gregory A Beamer
  • 16,870
  • 3
  • 25
  • 32