-1

I am trying to match dates in strings, but for some reason it only returns the month (mmm) but not the days (dd)

strings = ["Get it by may 1 - june 15", "Arrives between may 1 and june 15"]    
months = ['january', 'jan', 'february', 'feb', 'march', 'mar', 'april', 'apr', 'may', 'june', 'jun', 'july', 'jul', 'august', 'aug', 'september', 'sept', 'sep', 'october', 'oct', 'november', 'nov', 'december', 'dec']

pattern = r'\b({})\s\d{{1,2}}'.format('|'.join(months))


for string in strings:
      matches = re.findall(pattern3, string)
      print(f"String: {string}")
      print(f"Matches: {matches}")

The output:

String: Get it by may 1 - june 15
Matches: ['may', 'june'] # should be ['may 1', 'june 15']
String: Arrives between may 1 and june 15
Matches: ['may', 'june'] # should be ['may 1', 'june 15']

What am I doing wrong?

2 Answers2

2

You have to ignore the capture group ({}) with (?:{}):

pattern = r'\b(?:{})\s\d{{1,2}}'.format('|'.join(months))

Output:

String: Get it by may 1 - june 15
Matches: ['may 1', 'june 15']
String: Arrives between may 1 and june 15
Matches: ['may 1', 'june 15']
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

Would this help:

reg = r'\b(?:jan|feb|mar|apr|may|june|jul|aug|sep|oct|lap|dec)[\w]*\s+\d{1,2}\b|\b\d{1,2}(?:\s+(?:jan|feb|mar|apr|may|june|jul|aug|sep|oct|lap|dec)[\w]*\b|\s+\d{1,2})?\b'
       
for string in strings:
        matches = re.findall(reg, string)
        print(f"String: {string}")
        print(f"Matches: {matches}")
Gedas Miksenas
  • 959
  • 7
  • 15