0

I have the following string:

 'none; but currently has appt with new HJH PCP Rachel Salas, MD on October. 11, 2013 Other Agency Involvement: No\n')'

and I am trying to get "October. 11, 2013" out of it.

The comma after the 11 needs to be optional.

The code I am using is:

re.findall(r'(\S+)\s*\d+,*\s([2][0]\d\d|[1][9]\d\d)', raw_data[i])

and the output I am getting is skipping the 11:

[('October.', '2013')]

explanation of my logic:

(\S+)\s* # limit the word before the number to max 1 occurence, and avoid the full sentence. optional space between word and number

\d+,* # catch the number between the month and the year, with an optional comma

\s([2][0]\d\d|[1][9]\d\d) # catch the year after the space

I am very grateful for your help.

ZakS
  • 1,073
  • 3
  • 15
  • 27
  • 1
    Remove the first capturing parentheses group and make the second one non-capturing. Or, use `re.findall(r'\S+\s*\d+,*\s(?:20|19)\d\d', s)`. Or use your regex with `re.finditer` and grab `x.group()` in a list comprehension. – Wiktor Stribiżew Jul 19 '18 at 12:30
  • Dear @WiktorStribiżew - many thanks, first option works. – ZakS Jul 19 '18 at 14:22

0 Answers0