0

Trying to write a RE to recognize date format mm/dd in Python

reg = "(((1[0-2])|(0?[1-9]))/((1[0-9])|(2[0-9])|(3[0-1])|(0?[0-9])))"
match = re.findall(reg, text, re.IGNORECASE)
print match

For text = '4/13' it gives me [('4/13', '4', '', '4', '13', '13', '', '', '')]

Just need the first element. I don't want inexact matches, how do I remove them.

Thanks,

Cheng

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
cheng
  • 6,596
  • 6
  • 25
  • 27
  • 1
    possible duplicate of [python regular expression date formate](http://stackoverflow.com/questions/10484300/python-regular-expression-date-formate) – the wolf May 07 '12 at 15:26
  • @carrot-top it's same same guy asking a follow up ;-) – snies May 07 '12 at 15:29
  • @cheng look at your original question, second answer with match should be a better solution. – snies May 07 '12 at 15:32

2 Answers2

3

You're getting all those matches because each set of parentheses in your regular expression generates a match group. You can use a non-grouping match, such as (?:...), if you really don't want the groups in your result set. You can also simply take the first item from the list and ignore the others.

This would make your expression look like:

reg = "((?:(?:1[0-2])|(?:0?[1-9]))/(?:(?:1[0-9])|(?:2[0-9])|(?:3[0-1])|(?:0?[0-9])))"

See the re documentation for more information.

Here's a complete example:

>>> text='4/13'
>>> reg = "((?:(?:1[0-2])|(?:0?[1-9]))/(?:(?:1[0-9])|(?:2[0-9])|(?:3[0-1])|(?:0?[0-9])))"
>>> re.findall(reg, text, re.IGNORECASE)
['4/13']
larsks
  • 277,717
  • 41
  • 399
  • 399
0

They're not "inexact matches". The first item in the tuple corresponds to the matched string, and the other items correspond to the sections in parentheses in your regular expression.

If there are multiple dates in the string, you want this:

reg = re.compile(...)
dates = [match[0] for match in reg.findall(text, re.IGNORE_CASE)]
tangentstorm
  • 7,183
  • 2
  • 29
  • 38