I want to parse dates of different formats and dataparser seems to be the best option to handle most of weird cases. However, I'm having problem with dates without a day, e.g. "04/2022". I'd like such a string to be extracted as month=4, year=2022, day=None or day=1. Unfortunately parsing "04/2022" results in month=5, year=2022. Is there a way to force dateparser to treat one of two detected numbers as month? Dateutils parser seems to work fine in such a case, but then it fails with strings such as "polish_month_name Year". Is there a way to make the following function work the way I want?
def extract_dates(line: str):
"""
Extracts list of dates detected in line
:param line: string to look fot the dates
:return: list of
>>> line = "09/01/2019 oraz 4/2020, 09/2018, luty 2020"
>>> extract_dates(line)
[(1, 2019), (4, 2020), (9, 2018), (2, 2020)]
"""
extracted_dates = []
dates = search_dates(line, languages=['pl'], settings={'DATE_ORDER': 'DMY'})
if dates is not None:
for d in dates:
try:
parse_res = dateparser.parse(d[0], languages=['pl'])
extracted_dates.append((parse_res.month, parse_res.year))
except:
parse_res = 'None'
else:
extracted_dates.append('None')
return extracted_dates