1

(Note: This is not a question about what is the best way with code to do date validation. This is a question about learning more about regular expressions through some trial and error and other people's insight.)

I've been doing a lot of work with Regular Expressions lately (quite frankly I suck at them) I'm learning a lot though and I'm seeking expert opinions on a particular regular expression.

Right now I'm working on migrating a fairly large project to use .NET 4.0 It has a lot of parsing and data manipulation methods across many classes and namespaces... However the majority of not ALL of the parsing and validation has been done with large clunky for loops with a lot of IndexOf() calls.

I've been using quite successfully a combination of Regular Expressions, LINQ, and Extension methods to greatly simplify and clarify the parsing and validation methods.

The trial and error and RegexBuddy has helped tremendously with the learning curve.

Now on to my actual question.

I was working on updating a simple date validation though it is a very VERY loose validation

private static bool isLikeVCardDate(string value_Renamed)
{
  if (value_Renamed == null)
  {
    return true;
  }
  // Not really sure this is true but matches practice
  // Mach YYYYMMDD
  if (isStringOfDigits(value_Renamed, 8))
  {
    return true;
  }
  // or YYYY-MM-DD
  return value_Renamed.Length == 10 && value_Renamed[4] == '-' && value_Renamed[7] == '-' && isSubstringOfDigits(value_Renamed, 0, 4) && isSubstringOfDigits(value_Renamed, 5, 2) && isSubstringOfDigits(value_Renamed, 8, 2);
}

If I want to match that functionality a simple RegEx of

private static bool isLikeVCardDate(string value_Renamed)
{
  return Regex.IsMatch(value_Renamed, @"\d{4}-?\d{2}-?d{2}");
}

would meet the requirements

But it got me thinking How would I go about validating that the date was a completely valid date, leap year, days of the month the whole nine yards

In know there are other post about date validation with regex I'm not interested in someone outright giving me an answer I've got it working I'm wondering if there is any knowledge anyone can impart to me on how to maybe do it better or improve on it.

Mind you I know this is probably not the best example of a practical application of using a regex.

Here is the regex I came up with.

A few notes I'm pasting it in a "tabbed" view just for simplicity of reading, the actual regex has no spaces or new lines.

Additionally everything that is not a Named Capture Group is a non capture group (I left that out to save on space since I just want people's analysis of the regex)

(
 (?<YEAR>((([0][48])|([13579][26])|([2468][048]))00)|(\d{2}(([0][48])|([13579][26])|([2468][048]))))
 -?
 (
  (
   (?<MONTH>(0[13578])|( 1[02]))
   -?
   (?<DAY>(0[1-9])|([12][0-9])|(3[01]))
  )
  |
  (
   (?<MONTH>(0[469])|11)
   -?
   (?<DAY>(0[1-9])|([12][0-9])|30)
  )
  |
  (
   (?<MONTH>02)
   -?
   (?<DAY>(0[1-9])|([12][0-9]))
  )
 )
)
|
(
 (?<YEAR>\d{4})
 -?
 (
  (
   (?<MONTH>(0[13578])|(1[02]))
   -?
   (?<DAY>(0[1-9])|([12][0-9])|(3[01]))
  )
  |
  (
   (?<MONTH>(0[469])|11)
   -?
   (?<DAY>(0[1-9])|([12][0-9])|30)
  )
  |
  (
   (?<MONTH>02)
   -?
   (?<DAY>(0[1-9])|(1[0-9])|(2[0-8]))
  ) 
 )
)

Here is my thought process

  1. Days are relative to the months 4,6,9,11 are 30 days | 1,3,5,7,8,10,12 have 31 and 2 has 28 or 29

  2. Leap years are divisible by four unless it is divisible by 100 then only if also divisible by 400

    1. Based on this and the fact that ANY number is divisible by 4 if the last 2 digits as a number are divisible by 4

    2. Writing out the numbers from 4 - 96 I used the repeating pattern of 0(4,8), {even > 0}(0,4,8) and {odd}(2,6)

    3. Since testing for 400 year leap years gives us the first 2 digits of the year being applicable we can us the same pattern from #2 above

  3. Because of the leap year requirement the regex needs 2 separate captures for dates in a leap year and dates not in a leap year.

Now all of my assumptions could just be wrong and just plain out there but it's what I could come up with how much I understand regex so far

halfer
  • 19,824
  • 17
  • 99
  • 186
TofuBug
  • 573
  • 1
  • 6
  • 22
  • 2
    Why not just use `DateTime.TryParse`? Especially since it looks like you have specific date formats in mind. Not everything needs to be done via Regex ;) – rikitikitik Jun 07 '12 at 06:40
  • Don't reinvent the wheel. Use your language's date/time library. – Li-aung Yip Jun 07 '12 at 06:44
  • The first "regex" was a just to illustrate that with the existing "validation" that would suffice the original method does little to validate the actual value of the date. All it checks is is there is 4 numbers followed by an optional - 2 numbers followed by an optional - followed by 2 numbers. – TofuBug Jun 07 '12 at 06:50
  • @rikitikitik & Li-aung Yip I understand I can use that this wasn't a question about "BEST" use this is a LEARNING question about understanding regex better. – TofuBug Jun 07 '12 at 06:53
  • 1
    @TofuBug: one of the first thing to learn is to use the right tool for the job. You might come with a very convoluted regex that would not even cover all possible cases. And what would have you learned in that process? – David Brabant Jun 07 '12 at 06:57
  • @DavidBrabant This is NOT about using the right tool for the job I KNOW the right tool to use this is a purely theoretical question to help me gain a better overall understanding of regular expressions. I learn better getting my hands dirty and mucking around but sometimes I like to look to other people with more experience to get some feed back. Nine times out of ten I go through many contrived examples that would easily be done with another tool but it gives me something that i can relate to while trying to understand the mechanincs. – TofuBug Jun 07 '12 at 07:01
  • @rikitikitik I just tested with the TryParse method 2 problems First I don't ever need this as a DateTime it stays a string, second with a "YYYY-MM-DD" it works fine but for YYYYMMDD it returns false and that is another valid date format – TofuBug Jun 07 '12 at 07:15
  • @TofuBug Use `DateTime.TryParseExact` then if you want to have multiple datetime formats valid. And you don't need to keep the value as a date, you can just keep the original input string once you know that it's valid. – rikitikitik Jun 07 '12 at 07:20
  • Not sure how I feel about this one. Regex isn't the best fit for this problem, and also, this problem isn't a good fit for learning regex. The expression, while large, is actually relatively simple in terms of regex syntax. I'd look else where to familiarize with regex. – Kenneth Ito Jun 07 '12 at 07:32
  • @rikitikitik that still doesn't address the needless datetime variable i have to create to satisfy the out parameter for the TryParse and TryParseExact – TofuBug Jun 07 '12 at 13:54
  • @TofuBug As opposed to a huge Regex that you have to create plus the overhead that comes with it plus the awful maintainability? I'll stick with the extra DateTime variable. – rikitikitik Jun 08 '12 at 00:30
  • @rikitikitik Of course I would never use this method in actual production code but if I follow where you are going with this what kind of maintainability are you talking about? It's not like the number of days in the months or years is going to change any time soon. It would be an easy thing if clarification is needed to run through assert tests from Jan 1st 0001 through Dec 31st 9999. at that point i see nothing to be maintained. we've got about 7 thousand nine hundred years before we have to panic about the Y10K bug. – TofuBug Jun 08 '12 at 03:30
  • @TofuBug Your manager decides to change the allowed date formats. And you're not around anymore. What then? Anyway, this is already beyond the scope of your question as you have stated early on. Have fun learning Regex! – rikitikitik Jun 08 '12 at 05:31
  • @rikitikitik true it's beyond the scope but I find any debate worthwhile to learn something. In this case i'm dealing with an RFC standard for how the date is defined in a particular vcard version. as long as i'm parsing that version of vcards it cannot change. Thanks again for the back and forth. I honestly appreciate the thoughtful banter. – TofuBug Jun 08 '12 at 11:54

1 Answers1

2

I understand that you are doing this as an exercise to learn about regular expressions, so you might enjoy working out how the examples in the answers to these other questions work:

Of course, one of the most important lessons to learn about regular expressions is when NOT to use them. As a result I think you may struggle to get detailed feedback on the example you posted. The take home lesson here is that while some people enjoy writing complex regular expressions, very few enjoy reading (or extending or fixing) them.

Community
  • 1
  • 1
Ergwun
  • 12,579
  • 7
  • 56
  • 83
  • I very much understand peoples mentality some things I just took to right away when i first started programming way back when I was in elementary school (BASICA anyone? :-P). However sometimes concepts trip me up and i struggle a little to understand them. Back in the day it was how to properly manage memory in C and C++. Now I can code circles around an Expression tree, work easily with advanced concurrent data structures, even got a handle on closures and functional programming. But Regex just confuses the hell out of me. For that reason I need the VERY contrived examples to help me. – TofuBug Jun 07 '12 at 13:47
  • Plus I love being part of this community because it is such a great source of knowledge to draw from – TofuBug Jun 07 '12 at 13:47