3

I'm trying to match pair of digits in a string and capture them in groups, however i seem to be only able to capture the last group.

Regex:
(\d\d){1,3}

Input String: 123456 789101

Match 1: 123456
Group 1: 56

Match 2: 789101
Group 1: 01

What I want is to capture all the groups like this: Match 1: 123456
Group 1: 12
Group 2: 34
Group 3: 56

* Update
It looks like Python does not let you capture multiple groups, for example in .NET you could capture all the groups in a single pass, hence re.findall('\d\d', '123456') does the job.

newbie
  • 1,485
  • 2
  • 18
  • 43
  • possible duplicate of [Python regex multiple groups](http://stackoverflow.com/questions/4963691/), [Regular expression group capture with multiple matches](http://stackoverflow.com/questions/5598340/), [Python regexes: How to access multiple matches of a group?](http://stackoverflow.com/questions/5060659/) – outis Dec 28 '11 at 02:54

4 Answers4

6

You cannot do that using just a single regular expression. It is a special case of counting, which you cannot do with just a regex pattern. \d\d will get you:

Group1: 12 Group2: 23 Group3: 34 ...

regex library in python comes with a non-overlapping routine namely re.findall() that does the trick. as in:

     re.findall('\d\d', '123456')

will return ['12', '34', '56']

Ahmed Masud
  • 21,655
  • 3
  • 33
  • 58
2
(\d{2})+(\d)?

I'm not sure how python handles its matching, but this is how i would do it

AlanFoster
  • 8,156
  • 5
  • 35
  • 52
2

Try this:

import re
re.findall(r'\d\d','123456')
Óscar López
  • 232,561
  • 37
  • 312
  • 386
1

Is this what you want ? :

import re

regx = re.compile('(?:(?<= )|(?<=\A)|(?<=\r)|(?<=\n))'
                  '(\d\d)(\d\d)?(\d\d)?'
                  '(?= |\Z|\r|\n)')

for s in ('   112233  58975  6677  981  897899\r',
          '\n123456 4433 789101 41586 56 21365899 362547\n',
          '0101 456899 1 7895'):
    print repr(s),'\n',regx.findall(s),'\n'

result

'   112233  58975  6677  981  897899\r' 
[('11', '22', '33'), ('66', '77', ''), ('89', '78', '99')] 

'\n123456 4433 789101 41586 56 21365899 362547\n' 
[('12', '34', '56'), ('44', '33', ''), ('78', '91', '01'), ('56', '', ''), ('36', '25', '47')] 

'0101 456899 1 7895' 
[('01', '01', ''), ('45', '68', '99'), ('78', '95', '')] 
eyquem
  • 26,771
  • 7
  • 38
  • 46