-2

This is my first question here so please forgive and educate on any formatting errors. I am new to Python and going through automate the boring stuff. Decided to expand the date detection project by using the clipboard and formatting some things also. The problem I have is in any operation taken on the year part of the REGEX. I have commented out my last attempt to validate the year and gave up and changed the REGEX to only find dates from 1000 to 2999 and skipped code validation of the dates. I now need to validate leap years but I'm back to having to work with the year variable, but once again no operation has any effect. Basically the problem is I can extract the year value and display it but I cannot modify it or do checks against it.

#! python3
#! detect dates in a block of text

import pyperclip
import re

#!import numpy as np


text = str(pyperclip.paste())


def datedetection(text):
dateRegex = re.compile(
    r"""(
    (\d|\d\d)               #! match day
    (/{1})                  #! match /
    (\d|\d\d)               #! match month
    (/{1})                  #! match /
    ([1|2][0-9][0-9][0-9])  #! match year
    )""",
    re.VERBOSE,
)
matches = []
for groups in dateRegex.findall(text):
    day = str(groups[1])
    slash1 = str(groups[2])
    month = str(groups[3])
    slash2 = str(groups[4])
    year = str(groups[5])
    month_range_30 = ["04", "06", "09", "11"]
    month_range_31 = ["01", "03", "05", "07", "08", "10", "12"]
    month_range_Feb = ["02"]
    #!year_range = np.arange(1000, 3000, 1).tolist()

    if len(day) == 1:
        day = "0" + day
    else:
        day = day
    if len(month) == 1:
        month = "0" + month
    else:
        month = month

    if month in month_range_31:
        if int(day) > 31:
            day = "Too many days in a month with only 31 days."
            slash1 = month = slash2 = year = ""

    elif month in month_range_30:
        if int(day) > 30:
            day = "Too many days in a month with only 30 days."
            slash1 = month = slash2 = year = ""
    elif month in month_range_Feb:
        if int(day) > 29:
            day = "Too many days in February."
            slash1 = month = slash2 = year = ""
    elif int(month) > 12:
        day = "Found an invalid month."
        slash1 = month = slash2 = year = ""
    elif month in month_range_Feb:
        if (
            int(day) == 29
            and (int(year) % 4 == 0)
            and (int(year) % 400 == 0)
            and (int(year) % 100 == 0)
        ):
            day = day
    elif month in month_range_Feb:
        if (
            int(day) == 29 
            and (int(year) % 4 == 0) 
            and (int(year) % 100 != 0)
        ):
            day = "Found an invalid leap year."
            slash1 = month = slash2 = year = ""
    #!elif year not in year_range:
    #!day = "Year is out of range."
    #!slash1 = month = slash2 = year = ""

    dates = "".join([day, slash1, month, slash2, year])
    matches.append(dates)

if len(matches) > 0:
    pyperclip.copy("\n".join(matches))
    print("Copied to clipboard:")
    print("\n".join(matches))
else:
    print("No dates found.")


datedetection(text)
  • 1
    Regex is not the right tool to validate dates (even if it _can_ do the job). You may use regex to find/extract strings that _look like_ dates but don't use regex to _validate_ them. I don't know much python, but you should find a proper solution in [this question](https://stackoverflow.com/q/16870663/8967612). – 41686d6564 stands w. Palestine Sep 23 '20 at 02:54
  • Side notes regarding your pattern: 1) `\d|\d\d` can be `\d\d?`. 2) Never use `{1}`; it's redundant. 3) `[1|2]` should be `[12]` (or `(?:1|2)` if you want to use `|` but that's not needed here). 4) `[0-9]` repeated 3 times can be written as `[0-9]{3}`. Good luck! – 41686d6564 stands w. Palestine Sep 23 '20 at 02:57
  • `strptime` is your friend! https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior – anon01 Sep 23 '20 at 02:58
  • 20([0248][048]|[1379][26]) matches leap years in the 21st century. Adapt as required for other centuries. – TonyR Sep 23 '20 at 06:14
  • With regard to the practicability of regex for validating dates: regex would probably be more compact than the above code. With appropriate comments, it would also be understandable. – TonyR Sep 23 '20 at 06:23
  • Try (?x)\b (?!31-(0?[2469]|11))#months w/o 31 days (?!30-0?2)#month w/o 30 days (?!29-0?2-\d{2}([0248][1235679]|[1379][01345789]))#Feb. w/o 29 days ((0?[1-9]|[1-2]\d|3[01]))#dd -((0?[1-9]|1[0-2]))#mm -\d{4}#yyyy \b for year 2000-2099. Adapt as required for other centuries (leap year) and if necessary for Python syntax, – TonyR Sep 23 '20 at 06:54
  • Thanks for all of this. Its a lot to unpack and learn but on initial go through of the comments the regex isn't actually validating the year, just extracting ones that fall in the range 1000 to 2999. Then I wanted to write code against year = str(groups[5]) to validate the years against days and months. day = str(groups[1]) works fine for days as well as group 2 for months. Its just the years that I can't get to work. I know I'm not very pythonistic, this is just an exercise to learn. Thanks, all. – Robert Kraft Sep 23 '20 at 23:47
  • 41686d6564 - Changed all the regex to your suggestions. Works and looks alot more efficient. – Robert Kraft Sep 24 '20 at 02:22

2 Answers2

0

The problem was the operations before the year operations. The day and month operations were overwriting the year values. Not entirely sure how or why at this point, but moving the year code above the day and month code has started to fix the issue.

0

In my approach to this project, I considered validating the days, months, and year ranges as part of the regular expression. I then defined functions to check for the leap year, and validate the number of days according to the months.

I found that way simpler and easier to understand and follow. As below:

dateRegex = re.compile(r'([0-3][0-9])/([0-1][0-9])/([1-2][0-9]{3})')
def is_leap_year(year):
    year = int(year)
    if year % 4 == 0:
        if year % 100 == 0:
            return year % 400 == 0
        else:
            return True
    else:
        return False


    def is_valid_date(day, month, year):
    if month == '02':
        if is_leap_year(year):
            return int(day) <= 29
        else:
            return int(day) <= 28
    elif month in ['04', '06', '09', '11']:
        return int(day) <= 30
    else:
        return int(day) <= 31

You can find the rest of my code below.

https://gist.github.com/alialbusaidi/f56e4c9342f622434f8bff0549f94884

Flair
  • 2,609
  • 1
  • 29
  • 41
Ali
  • 26
  • 3