-2

Want to use the or operator(|) within a regular expression match but its not working. Getting error message as "unsupported operand". Can someone please have a look at it here?

Even tried both | and "or" but both didn't work.

import re

date = "02.29.2001"
date29 = "((02)\.(0[1-9]|[12][0-9]))"
date28 = "((02)\.(0[1-9]|[12][0-8]))"
res = re.search((date28 | date29)+("\.(\d+)"),date)

Here if I use only date29 then it matches but when I changed it to as above , it didn't match.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
s.patra
  • 147
  • 1
  • 9
  • Did you try `date28+"|"+date29`? – Scott Hunter Sep 26 '19 at 15:34
  • What you can do is combine `date28 + "|" + date29` to build your search regex. What part are you trying to search with this `+("\.(\d+)")`??. I also recomend you to use a sandbox online just to build your regex beforehand such as [this one](https://regex101.com/) – MikeMajara Sep 26 '19 at 15:40

1 Answers1

1

You are mixing Python syntax (the or boolean and | bitwise OR operators) with regex syntax.

While regular expressions do use | to separate alternate patterns, the syntax used in regular expressions is distinct and separate from Python operators. You can't arbitrarily combine the two. Regular expression syntax is passed to the re module functions via strings, not as Python expressions.

This works:

either = r"({}|{})\.(\d+)".format(date28, date29)
res = re.search(either, date)

because the regular expression pattern is combined into a single string using regular expression syntax first.

Note that there is no point in using date28 here, because everything that date28 can match, can also be matched by date29. Moreover, date28 won't match 02.19., a valid date in February.

If you want to construct a regex from 'labelled' components, I recommend you use the re.VERBOSE flag, which causes whitespace in a regex (including newlines) to be ignored, and adds support for using # ... comments. To match whitespace, use explicit classes such as [ ], [\n], \s, etc. I often combine this with explicit group names too.

E.g. your expression could be written out as:

february_date = re.compile(
    r"""
    (
        02\.     # month, always February
        (        # Leap year
            0[1-9]      # first 9 days
            |
            [12][0-9]   # remainder from 10 to 29
        )
        |
        02\.
        (        # regular year
            0[1-9]      # first 9 days
            |
            [12][0-8]   # remainder 10-18, 20-28
        )
    )
    \.(\d+)   # The year
    """, flags=re.VERBOSE)
res = february_date.search(date)

This format also makes it much easier to see that you are matching 02\. at the start in either pattern, which is rather redundant, and the above pattern of course still has the issue with [12][0-8] both being redundant against [12][0-9] and not actually matching the 19th of February.

Personally, I'd just use \d{2}\.\d{2}\.\d{4} and then use datetime.strptime() to validate that the matched text is actually a valid date. Building a regex to validate dates is a mammoth task, and simply not worth the effort.

For example, the pattern you tried to construct doesn't tell you that 2001 was not a leap year, so 02.29.2001 is not a valid date. But trying to parse it using datetime.strptime() throws an exception, telling you this isn't a valid date:

>>> from datetime import datetime
>>> date = '02.29.2001'
>>> datetime.strptime(date, "%m.%d.%Y")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/_strptime.py", line 458, in _strptime
    datetime_date(year, 1, 1).toordinal() + 1
ValueError: day is out of range for month
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343