2

I'm trying to validate a date format input. The input is not the actual date but the D M Y input. And i don't want to validate the actual Date! just the formatting.

I want to evaluate any input done with double D, double M, double or quadruple Y with - or _ dividers.

My current RegEx looks the following: ^(?=.*[mM]{2})(?=.*[dD]{2})(?=.*[yY]{2,4})(?=.*[-_]{0,2}).*$

However this evaluates true even if more than the expected characters are found. The Limiters {2} seem to have no effect.

For example: mmddyyyymmmmmm will evaluate true even there are multiple m in there. which i don't understand.

The expected result is that only combinations such as the following can test true:

dd-mm-yy
MM-DD_YYYY
yyyy_dd-MM
mmddyy
YYYYddMM

and not something like:

ddyyyyymmmmmmmmm
mmddyymm

Please help me to correct my RegEx.

secondplace
  • 518
  • 3
  • 8
  • 20
  • 1
    That is rather a hard job for a regex, but `^(?!.*[mM]{2}.*[mM])(?!.*[Dd]{2}.*[Dd])(?!.*[yY]{4}.*[Yy])(?!.*(?:^|[^yY\n])[yY]{2}[^yY\n].*[yY])(?:(?:[mM]{2}|[dD]{2}|[yY]{2}(?:[yY]{2})?)(?:[_-](?!$))?){3}$` [looks to be doing](https://regex101.com/r/bJu2Lm/1) what you need. The `\n`s are there only since the test is performed against a multiline string. – Wiktor Stribiżew Sep 16 '19 at 06:48
  • This was def over my head to do myself. All i noticed is that it also evaluates `ddyyyy`, `yyyymm`, `yyyydd` true. I only need to evaluate single inputs so it's ok to do with regex, if i had to evaluate 1000s of values it be overkill. – secondplace Sep 16 '19 at 07:45

1 Answers1

2

Usually, it makes sense to match a string that can only match the string containing allowed blocks and then use some programming means to do the rest of the "counting" work (you just check how many mm, dd, or yyyy / yy there are).

If you have to use a regex, there are two approaches.

Solution #1: Enumerating all alternatives

It is the least comfortable, not dynamic/unscalable solution where you just collect all possible pattern inside a single group:

^(?:
  [dD]{2}[_-]?[mM]{2}[_-]?[yY]{2}(?:[yY]{2})? |
  [mM]{2}[_-]?[dD]{2}[_-]?[yY]{2}(?:[yY]{2})? |
  [mM]{2}[_-]?[yY]{2}(?:[yY]{2})?[_-]?[dD]{2} |
  [dD]{2}[_-]?[yY]{2}(?:[yY]{2})?[_-]?[mM]{2} |
  [yY]{2}(?:[yY]{2})?[_-]?[dD]{2}[_-]?[mM]{2} |
  [yY]{2}(?:[yY]{2})?[_-]?[mM]{2}[_-]?[dD]{2}
)$

See the regex demo. ^ asserts the position in the start of the string, (?:...|...) non-capturing group with the alternatives and $ asserts the end of string.

Solution #2: Dynamic approach

This approach means matching a string that only consists of three D, M, or Y blocks and restricting the pattern with positive lookaheads that will require the string to only contain a single occurrence of each block. The bottleneck and the problem is that the blocks are multi-character strings, and thus you need to use a tempered greedy token (or unwrap it, making the regex even more monstrous):

^
  (?=(?:(?![mM]{2}).)*[mM]{2}(?:(?![mM]{2}).)*$)
  (?=(?:(?![dD]{2}).)*[dD]{2}(?:(?![dD]{2}).)*$)
  (?=(?:(?![yY]{2}(?:[yY]{2})?).)*[yY]{2}(?:[yY]{2})?(?:(?![yY]{2}(?:[yY]{2})?).)*$)
  (?:
    (?:[mM]{2}|[dD]{2}|[yY]{2}(?:[yY]{2})?)
    (?:[_-](?!$))?
  ){3}
$

See the regex demo

So, here, the (?:[mM]{2}|[dD]{2}|[yY]{2}(?:[yY]{2})?)(?:[_-](?!$))? parts repeats 3 times from start to end, so, the string can contain three occurrences of d, y or m, even if they are the same (mmmmmm will match, too). The lookaheads are all in the form of (?=(?:(?!BLOCK).)*BLOCK(?:(?!BLOCK).)*$) - matches only if there is any text but BLOCK, then a BLOCK and then any text but BLOCK till the end of the string.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563