0

I was checking out cake's library and found a regular expression for date in its Validation.php file. I used it against various date values and found that it even matches wrong date values for some specific dates.

For example, It matches perfectly against following dates (which in fact, it should):-

20/01/2011
19/09/2017
20/01/1601

But when i use a wrong date value with 29 and/or 30 as a date, then surprisingly it matches them too (which it should not):-

30/,/1601
29/,/2017

https://regex101.com/r/8Q96bd/1/

One more interesting thing is, if you change date and use another date except 30 and 29, then expression wouldn't match it.

Use any other date except 29 & 30 and it doesn't match it:-

28/,/1600

https://regex101.com/r/UKuPWU/1/

Then why on the earth cakephp's date regex expression matches a wrong date value if it contains 30 & 29 as a date?

Here is the expression:-

^(?:(?:(?:31(\\\/|-|\\.|\\x20))(?:0?[13578]|1[02]))\1|(?:(?:29|30)([-\/])(?:0?[1,3-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29([-\/])0?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])([-\/])(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$

You can find this expression inside cakephp lib directory (i am using cake 2.x though).

\lib\Cake\Utility\Validation.php  (check out its date function)

enter image description here

My doubts are following:-

1. Why it is allowing comma in place of month only for 29 & 30?

2. Why it has used x20 in the regular expression? What is the need of using it in a DATE expression?

3. Is there any date standard/rule/specification am i missing which allows 29 & 30 to be used without mentioning month?

Could anyone please help in understanding the logic behind all these things?

Sumit Parakh
  • 1,098
  • 9
  • 19
  • Don't you see the comma in `[1,3-9]`? That is why the comma is matched. – Wiktor Stribiżew Jun 27 '17 at 06:28
  • I understood that but i was trying to understand WHY cakephp would use comma in the first place if others have to remove it later? – Sumit Parakh Jun 27 '17 at 06:48
  • 1
    It is a mistake. There are a lot of libraries with regexes, and I have seen such (and other) inaccuracies in them. It is common. There are typos in documentations (yesterday, there was some Django question). There is some site saying you may use `[A-z]` to match all ASCII letters, but in fact, [it is wrong](http://stackoverflow.com/a/29771926/3832970). – Wiktor Stribiżew Jun 27 '17 at 06:51
  • Ok. Thanks for the beautiful explanation and demo. I shall remember it. – Sumit Parakh Jun 27 '17 at 06:55

1 Answers1

2

The comma inside a character class is meaningful to the regex engine. [1,3-9] matches 1, , (!), 3, 4, 5, 6, 7, 8 and 9.

You need to remove that comma.

Besides, there are loads of redundant groupings in the pattern that only prevent the pattern debugging.

Here is a cleaner version of the regex:

^(?:31([-\/.\x20])(?:0?[13578]|1[02])\1|(?:29|30)([-\/])(?:0?[13-9]|1[0-2])\2)(?:1[6-9]|[2-9]\d)?\d{2}$|^29([-\/])0?2\3(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:16|[2468][048]|[3579][26])00)$|^(?:0?[1-9]|1\d|2[0-8])([-\/])(?:0?[1-9]|1[0-2])\4(?:1[6-9]|[2-9]\d)?\d{2}$

See the regex demo

Note that \x20 matches a space, the char with decimal code 32. \x20 is used in order not to introduce a literal whitespace into the pattern so that it could be easily debugged with the x (freespacing) modifier when you may add comments to the pattern and break into separate lines (see example).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563