1

Hi first time regex user here. Just trying to figure out some regex but need some help.

I have a text file with the following items:

10:67 12:12 01:50 23:60 23:50

And I'm trying to get a list of the valid times so the output should be:

['12:12', '01:50', '23:50']

Here is my code:

import re
inFile = open("text.txt")
text = inFile.read()
pattern = re.findall('([01]\d|2[0-3]):[0-5]\d', text)
print pattern

My output is:

['12', '01', '23']

Any help figuring out whats wrong? Thanks!!!

michael
  • 61
  • 1
  • 7

1 Answers1

6

Python apparently only prints the first group (that's ([01]\d|2[0-3]) in your case). If you make a non-capturing group ((?: ... )) of it, you should see the desired result:

text = '10:67 12:12 01:50 23:60 23:50'
pattern = re.findall('(?:[01]\d|2[0-3]):[0-5]\d', text)
print pattern

displays:

['12:12', '01:50', '23:50']

More info on (non-) capturing groups: http://www.regular-expressions.info/brackets.html

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • 3
    Actually, ``findall()`` returns all groups or, if there are no groups, the entire match (to see this, add a second group around the minutes to the original expression and Python will return a tuple). Making the group non-capturing is the correct answer though. – Blair May 10 '11 at 08:03
  • 1
    @Blair, I was already looking for an explanation in the Python docs to find out the exact behavior. Thanks! – Bart Kiers May 10 '11 at 08:04