I have the following string:
'none; but currently has appt with new HJH PCP Rachel Salas, MD on October. 11, 2013 Other Agency Involvement: No\n')'
and I am trying to get "October. 11, 2013" out of it.
The comma after the 11 needs to be optional.
The code I am using is:
re.findall(r'(\S+)\s*\d+,*\s([2][0]\d\d|[1][9]\d\d)', raw_data[i])
and the output I am getting is skipping the 11:
[('October.', '2013')]
explanation of my logic:
(\S+)\s* # limit the word before the number to max 1 occurence, and avoid the full sentence. optional space between word and number
\d+,* # catch the number between the month and the year, with an optional comma
\s([2][0]\d\d|[1][9]\d\d) # catch the year after the space
I am very grateful for your help.