-1

I'm using re.findall to parse the year and month from a string, however it is only outputting patterns from half the string. Why is this?

date_string = '2011-1-1_2012-1-3,2015-3-1_2015-3-3'

find_year_and_month = re.findall('[1-2][0-9][0-9][0-9]-[1-12]', date_string)

print(find_year_and_month)

and my output is this:

['2011-1', '2012-1']

This is the current output for those dates but why am I only getting pattern matching for half the string?

2 Answers2

0

Adjust your regex pattern as shown below:

import re

date_string = '2011-1-1_2012-1-3,2015-3-1_2015-3-3'    
find_year_and_month = re.findall('([1-2][0-9]{3}-(?:1[0-2]|[1-9]))', date_string)

print(find_year_and_month)

The output:

['2011-1', '2012-1', '2015-3', '2015-3']
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • Interesting thing: You should replace `(?:[1-9]|1[0-2]))` with `(?:1[0-2]|[1-9]))` because otherwise (accoring to my testing on regex101.com) a string like `2011-12-1` will only match the `2011-1` part and not `2011-12`. (I think this is why regex's are called _greedy_.) Seems like the first part of the `or` (represented by the pipe `|`) is evaluated, takes a minimal match and then does not evaluate the second option of the `or`. (Actually not sursprising, if you think about it.) – Michael H. Mar 08 '18 at 19:21
  • @Michael, see my update (for that case) – RomanPerekhrest Mar 08 '18 at 19:27
  • Yep. Even though I like the `[2-9]|1[0-2]?` given in the link presented in @sam's answer best. (Did not think of that possibility myself.) – Michael H. Mar 08 '18 at 19:30
0

[1-12] doesn't do what you think it does. It matches anything in the range 1 to 1, or it matches a 2.

See this question for some replacement regex options, like ([1-9]|1[0-2]): How to represent regex number ranges (e.g. 1 to 12)?

If you want an interactive tool for experimenting with regexes, I personally recommend Regexr.

sam
  • 366
  • 1
  • 11