1

I am sorry in advance if this would fall under duplicates but I could not see these answered my questions.

Could you please help and explain:

  1. Where is the match or capture only for name held? The initial part of the pattern [A-Za-z0-9_\-\.]+ does not show it between brackets so I understand it won't be a group, how then is name captured and held as a component of Match 0?

  2. If I replace the string t2 to name@domain.com alt@yahoo.net and pattern to ^([A-Za-z0-9_\-\.\ ]+@(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+)+$

    • I would expect 2 matches: One for each full email address. Output only shows 1 match holding both separated by a space, why?
    • How should the pattern read to get 2 matches or would the string need to be different for this pattern?
    • I don't see the consistency in the Group output because it does not show another Group holding capture 0=com and capture 1=net, similarly to Group 2 holding domain. and yahoo. captures, why?
    • Group 3 captures seem to hold the captures of the Group 2 Capture 0 and 1, is that how hierarchies work, there are captures of captures of groups?

Code

static void Main(string[] args)
    {
        string t2 = "name@domain.com";
        string p2 = @"^[A-Za-z0-9_\-\.\ ]+@(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+$";

        MatchCollection matches = Regex.Matches(t2, p2);
        GroupCollection gc;
        int groupIndex = 0;
        int matchIndex = 0;
        int captureIndex = 0;

        foreach (Match nextMatch in matches)
        {
            gc = nextMatch.Groups;
            Console.WriteLine("Match {0} holds: {1}", matchIndex, nextMatch.Value);
            matchIndex++;
            foreach (Group g in gc)
            {
                Console.WriteLine("Group {0} holding: {1}", groupIndex, g.ToString());
                groupIndex++;

                foreach (Capture capture in g.Captures)
                {
                    Console.WriteLine("\tCapture {0} holds {1}", captureIndex, capture.ToString());
                    captureIndex++;
                }
                captureIndex = 0;
            }
            groupIndex = 0;
        }
        matchIndex = 0;
    }

Output for the above code:

Match 0 holds: name@domain.com
Group 0 holding: name@domain.com
Capture 0 holds name@domain.com
Group 1 holding: domain.
Capture 0 holds domain.
Group 2 holding: n
Capture 0 holds d
Capture 1 holds o
Capture 2 holds m
Capture 3 holds a
Capture 4 holds i
Capture 5 holds n
Group 3 holding: m
Capture 0 holds c
Capture 1 holds o
Capture 2 holds m
Press any key to continue . . .

Output if string t2 = "name@domain.com alt@yahoo.net"; and string p2 = @"^([A-Za-z0-9_\-\.\ ]+@(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+)+$";

Match 0 holds: name@domain.com alt@yahoo.net
Group 0 holding: name@domain.com alt@yahoo.net
Capture 0 holds name@domain.com alt@yahoo.net
Group 1 holding:  alt@yahoo.net
Capture 0 holds name@domain.com
Capture 1 holds  alt@yahoo.net
Group 2 holding: yahoo.
Capture 0 holds domain.
Capture 1 holds yahoo.
Group 3 holding: o
Capture 0 holds d
Capture 1 holds o
Capture 2 holds m
Capture 3 holds a
Capture 4 holds i
Capture 5 holds n
Capture 6 holds y
Capture 7 holds a
Capture 8 holds h
Capture 9 holds o
Capture 10 holds o
Group 4 holding: t
Capture 0 holds c
Capture 1 holds o
Capture 2 holds m
Capture 3 holds n
Capture 4 holds e
Capture 5 holds t
Press any key to continue . . .
Community
  • 1
  • 1
Sergio Solorzano
  • 476
  • 9
  • 29
  • *"how then is "name" captured and held as a component of Match 0?"*: Match 0 is the entire string matched by the regular expression. `^[A-Za-z0-9_\-\.\ ]+` is part of the regular expression, as you can observe by looking. Therefore, it is matched. – 15ee8f99-57ff-4f92-890c-b56153 Apr 25 '18 at 14:09
  • *"**I would expect** 2 matches, one for each full email address."* -- You expect that because you didn't read the documentation and find out what `^` means. Go read it now. I'm voting to close because "Pleas please read the docs to me out loud" isn't a programming question. – 15ee8f99-57ff-4f92-890c-b56153 Apr 25 '18 at 14:10
  • @EdPlunkett, thanks for your first comment, I can see it matched at match 0 in the output I've pasted, but thought it'd be held as a subcomponent of the entire match 0 somewhere similar to groups/captures, I infer and see in debug that if this part is in () it shows the group it's held in, otherwise it only shows in Match 0. On your second comment, my notes on ^ say beginning of input text but if remove "^" in @"^([A-Za-z0-9_\-\.\ ]+@(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+)+$" I get the same output, where should I be looking? – Sergio Solorzano Apr 25 '18 at 14:59
  • Hi @zᴉɹɥƆ the change is revealing, thanks a lot for your help! – Sergio Solorzano Apr 25 '18 at 14:59

1 Answers1

1

The Match covers the matching of the entire regex. The regex can be applied to the given string.

Groups are part of that Match and Captures are (if you specified multiple occurences of a group like (someRegex)+ ) all Captures of that Group. Try changing ([A-Za-z\-])+ to ([A-Za-z\-]+) and see the difference!

Examples:

\w*(123)\w* on "asdsa123asdf"

  1. Match -> asdsa123asdf
  2. Group -> 123 (== last capture)
  3. Captures -> 123

\w*([123])+\w* on "asdsa123asdf"

  1. Match -> asdsa123asdf
  2. Group -> 3 (== last capture)
  3. Captures -> 1, 2, 3

There are multiple sites to test and show details of your regex, i.e. https://regexr.com or https://regex101.com

Chrᴉz remembers Monica
  • 1,829
  • 1
  • 10
  • 24