1

I'm having an issue with a Regex match not containing the full text of what it matched. It only contains the last letter of the month name, and the day and year portion. I thought it would contain the full month name, and the day and year portion, since that is what my regex expression contains, but for some reason it doesn't.

Here is my example that replicates my issue: https://ideone.com/wJPj1d

using System;
using System.Text;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        string text = "<strong>Date of Hire: </strong>November 2, 2015<br />";
        string foundMatch = "No match found";
        Regex dateFormat = new Regex("[January|February|March|April|May|June|July|August|September|October|November|December] [0-9]{1,2}, [0-9]{4}");
        MatchCollection matches = dateFormat.Matches(text);
        if(matches.Count > 0)
        {
            foundMatch = matches[0].ToString();
        }
        Console.WriteLine(foundMatch);
    }
}

What I get for output is: r 2, 2015

What I would expect it to be: November 2, 2015

Zack
  • 2,789
  • 33
  • 60
  • 1
    Why do you use regex? (http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) . Have you tried HtmlAgilityPack? – Eser Apr 25 '16 at 20:03
  • @Eser I'm not parsing the HTML, I am just matching a date format within an HTML formatted string. – Zack Apr 25 '16 at 20:05
  • Zack, what is different... You parse the html to get a specific value.... (how do you call this process?) – Eser Apr 25 '16 at 20:07
  • 1
    @Zack I think the point Eser is making is that 99% of the time you shouldn't use regular expressions with html. Your case could be considered to be in the 1%. – juharr Apr 25 '16 at 20:07
  • 1
    Just FYI to prevent future regex problems in case you didn't figure it out from the answer or the link in the answer: Your code was matching *any* character inside the brackets `[]`, including the character `|` which is not treated as an OR inside of `[]`. So it will match for example, `| 3, 2222`, or `m 9, 0000`, because `|` and `m` are in the list of characters to match. You can test [here](https://regex101.com/) – Quantic Apr 25 '16 at 20:11
  • @Quantic Makes sense. Thanks! – Zack Apr 25 '16 at 20:16

1 Answers1

3

Use a group (...), not a character class [...]:

Regex dateFormat = new Regex("(January|February|March|April|May|June|July|August|September|October|November|December) [0-9]{1,2}, [0-9]{4}");
                              ^                                                                                     ^

See this IDEONE demo

If you do not need to access the captured month name, use a non-capturing group (?:...).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • @Zack: You used the wrong shape of brackets - square which should have been rounded. – Bob Salmon Apr 25 '16 at 20:03
  • No idea who down voted this answer but it was exactly what I needed. – Zack Apr 25 '16 at 20:03
  • 2
    @Zack: It is SO, people downvote whatever *they think* is wrong, not what *is* wrong. – Wiktor Stribiżew Apr 25 '16 at 20:04
  • 1
    Just to explain: a character class matches 1 symbol from the defined set of characters, and `[June|July]` matches 1 letter, either `J` or `u` or `l` or `n` or `y` or `|`. When you use parentheses, `(June|July)`, you match either `June` (a sequence of characters) or `July`, and `|` means OR. – Wiktor Stribiżew Apr 25 '16 at 20:15