You are mixing Python syntax (the or
boolean and |
bitwise OR operators) with regex syntax.
While regular expressions do use |
to separate alternate patterns, the syntax used in regular expressions is distinct and separate from Python operators. You can't arbitrarily combine the two. Regular expression syntax is passed to the re
module functions via strings, not as Python expressions.
This works:
either = r"({}|{})\.(\d+)".format(date28, date29)
res = re.search(either, date)
because the regular expression pattern is combined into a single string using regular expression syntax first.
Note that there is no point in using date28
here, because everything that date28
can match, can also be matched by date29
. Moreover, date28
won't match 02.19.
, a valid date in February.
If you want to construct a regex from 'labelled' components, I recommend you use the re.VERBOSE
flag, which causes whitespace in a regex (including newlines) to be ignored, and adds support for using # ...
comments. To match whitespace, use explicit classes such as [ ]
, [\n]
, \s
, etc. I often combine this with explicit group names too.
E.g. your expression could be written out as:
february_date = re.compile(
r"""
(
02\. # month, always February
( # Leap year
0[1-9] # first 9 days
|
[12][0-9] # remainder from 10 to 29
)
|
02\.
( # regular year
0[1-9] # first 9 days
|
[12][0-8] # remainder 10-18, 20-28
)
)
\.(\d+) # The year
""", flags=re.VERBOSE)
res = february_date.search(date)
This format also makes it much easier to see that you are matching 02\.
at the start in either pattern, which is rather redundant, and the above pattern of course still has the issue with [12][0-8]
both being redundant against [12][0-9]
and not actually matching the 19th of February.
Personally, I'd just use \d{2}\.\d{2}\.\d{4}
and then use datetime.strptime()
to validate that the matched text is actually a valid date. Building a regex to validate dates is a mammoth task, and simply not worth the effort.
For example, the pattern you tried to construct doesn't tell you that 2001 was not a leap year, so 02.29.2001
is not a valid date. But trying to parse it using datetime.strptime()
throws an exception, telling you this isn't a valid date:
>>> from datetime import datetime
>>> date = '02.29.2001'
>>> datetime.strptime(date, "%m.%d.%Y")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/_strptime.py", line 458, in _strptime
datetime_date(year, 1, 1).toordinal() + 1
ValueError: day is out of range for month