1

I am trying to use regex for matching date(from 2000-2099). the following regex is perfectly okay.

((((^20[02468][048])|(^20[13579][26]))-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[0-1]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-9]$)))|((^20\d{2})-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[01]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-8]$)))))|0000-00-00){1}

note: match leap year(#days in month: 31, 30, 29) or normal year(#days in month: 31, 30, 28) or default(0000-00-00)

however it matches empty string too. I tried to search for the solution like this one but mine is way more complex and I tried to add {1,} like the suggestion stated in the link.

But it doesn't work.

and actually i don't understand why it matches empty string as well, could anyone please tell me too?

Community
  • 1
  • 1
Hon Lun Chan
  • 148
  • 2
  • 16
  • 2
    Are you implementing this in a particular language? It is highly likely that you can get around this with string length functionality which most languages provided. – npinti May 29 '15 at 14:06
  • i am writing for attribute `pattern` of `input` of HTML, this regex is stored in a PHP variable and then echo to the HTML – Hon Lun Chan May 29 '15 at 14:09
  • The outer `(…){1}` is pointless; it means the same as `…`. I hope that is part of a desparation attempt to get by the 'it matches the empty string' problem. – Jonathan Leffler May 29 '15 at 14:09
  • ah haha i tried to "emphasize" that it is `{1}` but seems it doesn't work as i imagine – Hon Lun Chan May 29 '15 at 14:13
  • 1
    [I cannot repro the empty string matching.](https://regex101.com/r/fX9bZ1/1) – Wiktor Stribiżew May 29 '15 at 14:13

4 Answers4

1

Add the required attribute to the input:

<input pattern="(((^20[02468][048])|(^20[13579][26]))-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[0-1]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-9]$)))|((^20\d{2})-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[01]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-8]$)))))|0000-00-00"
type="text"
required/>

The browser won't try to validate an empty input that doesn't have a required attribute.

http://jsfiddle.net/kyaLhqpu/ vs http://jsfiddle.net/kyaLhqpu/1/

dtanders
  • 1,835
  • 11
  • 13
0

It might make more sense to extract that regular expression from the input tag and simply create a Javascript function whose aim is to validate the input.

The validation will be two fold:

  1. First, you check that string provided is non-empty.
  2. You validate the string against the expression you are currently using.

Note though, the usage of Javascript could have the added advantage where you use actual mathematical operators such as <, > and = to perform numeric range validation, as opposed to what you are doing now. The end result should be easier to understand and change should the need arise in the future.

npinti
  • 51,780
  • 5
  • 72
  • 96
  • actually what i am doing is : make a dynamic table whose data are fetched from db each record has 2 date entry (validate!). there is a submit button for each row of data. I started with javascript, but maybe because i am new to js, I can only get the first row(form) of table. I cannot find any solution either. and then I think of regex in html instead.... – Hon Lun Chan May 29 '15 at 14:25
  • @HonLunChan: You might need to iterate over the rows of data. That being said, *dtanders* does seem to provide a valid solution. – npinti May 29 '15 at 14:28
  • ah but why do you say that is not a valid solution? i don't understand. i think both way will do? – Hon Lun Chan May 29 '15 at 14:35
  • @HonLunChan: I do not understand you comment. – npinti May 29 '15 at 14:39
  • @HonLunChan: If *dtanders*'s answer solved your problem, please make sure to mark it. – npinti May 29 '15 at 14:49
  • oh i didn't know about the mark, sorry i am new to here too, marked! – Hon Lun Chan May 29 '15 at 18:03
  • @HonLunChan: No worries. – npinti May 30 '15 at 17:11
0

I don't have a direct answer to the problem with the empty string being accepted. I don't think the trouble is in the regex — neither the original nor the revised version. I think the suggestion by dtanders is probably on track; your comments support that.

However, I think there is room to simplify and improve your regex.

There are a lot of unnecessary parentheses in the regex, and the logic seems a bit convoluted. You check for 0000-00-00 (actually the last option, but by far the shortest to describe), or for 'any valid date in any leap year' or for 'any valid date in a non-leap year'. That leads to a major repeated chunk of regex for validating all the invariant months.

There'd be less repetition if you restructured your code to test for 0000-00-00 or any valid day in any month or any valid leap day.

In Perl, you can write extended regular expressions where spaces aren't significant, and the regex can be spread over many lines to make it easier to understand. This leads to a test script like:

#!/usr/bin/env perl
use strict;
use warnings;

my $rx = qr/
    ^(
        20\d{2} -
            (   (0[13578] | 1[02]) - (0[1-9] | [12]\d | 3[01])
            |   ((0[469] | 11) - (0[1-9] | [12]\d | 30))
            |   (02 - (0[1-9] | 1\d | 2[0-8]))
            )
    |   (^20[02468][048] | ^20[13579][26]) - 02 - 29 # Leap day
    |   0000-00-00
    )$
    /x;

while (<>)
{
    chomp;
    printf "%s: %s\n", (m/$rx/ ? "PASS" : "FAIL"), $_;
}

If the regex is flattened onto a single line (and the comment removed), then you get:

^(20\d{2}-((0[13578]|1[02])-(0[1-9]|[12]\d|3[01])|((0[469]|11)-(0[1-9]|[12]\d|30))|(02-(0[1-9]|1\d|2[0-8])))|(20[02468][048]|20[13579][26])-02-29|0000-00-00)$

The original regex occupies 276 characters. The revision occupies 158 when flattened.

I called the script regex-hell and created a file various-dates with various sample dates in it. The output was:

PASS: 0000-00-00
FAIL: 0001-00-00
FAIL: 0000-01-00
FAIL: 0000-00-01
FAIL: 2000-00-00
FAIL: 2000-01-00
FAIL: 2000-00-01
PASS: 2000-01-01
PASS: 2000-02-28
PASS: 2000-02-29
PASS: 2001-02-28
FAIL: 2001-02-29
PASS: 2003-03-31
FAIL: 2003-03-32
PASS: 2004-04-30
FAIL: 2004-04-31
PASS: 2005-05-31
FAIL: 2005-05-32
FAIL: 2005-05-00
PASS: 2005-05-01
PASS: 2006-06-30
FAIL: 2006-06-31
PASS: 2007-07-31
FAIL: 2007-07-32
PASS: 2008-08-31
FAIL: 2008-08-32
PASS: 2009-09-30
FAIL: 2009-09-31
FAIL: 2009-09-32
PASS: 2010-10-30
PASS: 2010-10-31
FAIL: 2010-10-32
PASS: 2011-11-30
FAIL: 2011-11-31
PASS: 2012-12-31
FAIL: 2012-12-32
PASS: 2099-01-01
PASS: 2099-12-31
FAIL: 
Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • do you mean that if I should check it in a sequence of `0000-00-00` then normal then leap year? from the smallest to the largest, and delete some redundant parentheses? I think adding `()` will be easier to see but I know that there will be a memory of the regex inside `()`, so this will be slower? – Hon Lun Chan May 29 '15 at 18:02
  • There's no virtue to putting 0000-00-00 first; in fact, my revised RE left it at the end. It was simpler for explaining to have list the simple entry at the start, that's all. I found the extra parentheses confusing rather than helpful (they weren't all where I expected them to be), but your RE worked correctly when used (and your problem was that it simply wasn't applied when there was no data in the field), and my shorter version also works -- and because it is simpler, it might be marginally but probably unmeasurably faster. Mostly, though, it is easier to understand. – Jonathan Leffler May 29 '15 at 18:32
  • I have read your regex completely. I really appreciate your regex!! Thank you very much for a better idea! And I think I understand why you mean now! :) – Hon Lun Chan May 30 '15 at 13:26
0

I don't think it matches the empty string, and neither does Rubular, but whatever the case, add an anchored negative look-ahead for end of input ^(?!$) to your regex to prevent a blank from matching:

^(?!$)((((^20[02468][048])|(^20[13579][26]))-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[0-1]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-9]$)))|((^20\d{2})-(((0[13578]|1[02])-(0[1-9]|[12]\d|3[01]$))|((0[469]|11)-(0[1-9]|[12]\d|30$))|(02-(0[1-9]|1\d|2[0-8]$)))))|0000-00-00){1}
Bohemian
  • 412,405
  • 93
  • 575
  • 722