I have a series of text blocks that contain a date written as "The first Wednesday of September, 2021" or "The third Monday in July, 2022", etc. I am not sure of the best way to extract the text and reformat it as a standard 'Month Day, Year' format. I have tried using the datefinder library with fuzzy matching on, but 'first Tuesday' and others have failed, I believe because it isn't a normal date format. Any ideas would be greatly appreciated, thanks all!
Asked
Active
Viewed 41 times
1
-
You will have to first parse the input yourself to split out the day, month and year after that you can use for example datetime to create date objects for further use. If all input dates follow the format you describe here parsing them should be pretty trivial. – binaryescape Aug 01 '23 at 19:04
1 Answers
1
Assume all dates in the text are in The cardinal day_of_week of Month, Year
format (You have to replace in with of in the second date):
import calendar
import re
text = [
"The first Wednesday of September, 2021",
"The third Monday of July, 2022",
# more dates
]
pattern = r"The (\w+) (\w+) of (\w+), (\d{4})"
cardinal = {
"first": 1,
"second": 2,
"third": 3,
"fourth": 4,
"fifth": 5
}
def find_nth_day_of_week(year_str, month_name, day_of_week, n_str):
year = int(year_str)
month = list(calendar.month_name).index(month_name.capitalize())
if month == 0:
return None
n = cardinal.get(n_str.lower())
if n is None:
return None
cal = calendar.monthcalendar(year, month)
day_index = list(calendar.day_name).index(day_of_week.capitalize())
nth_occurrence = [week[day_index] for week in cal if week[day_index] != 0]
if n > len(nth_occurrence):
return None
day = nth_occurrence[n - 1]
date = f"{calendar.month_abbr[month]} {day}, {year}"
return date
def parse_text(text):
match = re.match(pattern, text)
if match:
cardinal, day_of_week, month, year = match.groups()
return find_nth_day_of_week(year, month, day_of_week, cardinal)
return None
dates = [parse_text(block) for block in text]
for i, date in enumerate(dates):
print(f"Date {i + 1}: {date}")

Byte Ninja
- 881
- 5
- 13
-
Although I have not checked any details in the code, this seems and essentially sound approach. – Bill Bell Aug 04 '23 at 18:51