1

I want to parse dates of different formats and dataparser seems to be the best option to handle most of weird cases. However, I'm having problem with dates without a day, e.g. "04/2022". I'd like such a string to be extracted as month=4, year=2022, day=None or day=1. Unfortunately parsing "04/2022" results in month=5, year=2022. Is there a way to force dateparser to treat one of two detected numbers as month? Dateutils parser seems to work fine in such a case, but then it fails with strings such as "polish_month_name Year". Is there a way to make the following function work the way I want?

def extract_dates(line: str):
    """
    Extracts list of dates detected in line
    :param line: string to look fot the dates
    :return: list of
    >>> line = "09/01/2019 oraz 4/2020, 09/2018, luty 2020"
    >>> extract_dates(line)
    [(1, 2019), (4, 2020), (9, 2018), (2, 2020)]

    """
    extracted_dates = []
    dates = search_dates(line, languages=['pl'], settings={'DATE_ORDER': 'DMY'})
    if dates is not None:
        for d in dates:
            try:
                parse_res = dateparser.parse(d[0], languages=['pl'])
                extracted_dates.append((parse_res.month, parse_res.year))
            except:
                parse_res = 'None'
    else:
        extracted_dates.append('None')
    return extracted_dates
Malgorzata
  • 11
  • 1

0 Answers0