RegEx with multiple groups?

Question

I'm getting confused returning multiple groups in Python. My RegEx is this:

lun_q = 'Lun:\s*(\d+\s?)*'

And my string is

s = '''Lun:                     0 1 2 3 295 296 297 298'''`

I return a matched object, and then want to look at the groups, but all it shows it the last number (258):

r.groups()  
(u'298',)

Why isn't it returning groups of 0,1,2,3,4 etc.?

I think what you directly refer to is called [Capturing a Repeated Group](http://www.regular-expressions.info/captureall.html) - or along the lines 'accessing every match in a quantified / repeated capture group'. see [this similar answer](http://stackoverflow.com/a/3537914/611007) for javascript. don't know for sure but ***they seem to be unsupported in python's regex flavor***. see [related python enhancement request](http://bugs.python.org/issue7132) and [related question](http://stackoverflow.com/q/15908085/611007) — n611x007, Apr 15 '14 at 12:53

Ben Blank · Accepted Answer · 2011-02-10T23:36:19.813

Your regex only contains a single pair of parentheses (one capturing group), so you only get one group in your match. If you use a repetition operator on a capturing group (+ or *), the group gets "overwritten" each time the group is repeated, meaning that only the last match is captured.

In your example here, you're probably better off using .split(), in combination with a regex:

lun_q = 'Lun:\s*(\d+(?:\s+\d+)*)'
s = '''Lun: 0 1 2 3 295 296 297 298'''

r = re.search(lun_q, s)

if r:
    luns = r.group(1).split()

    # optionally, also convert luns from strings to integers
    luns = [int(lun) for lun in luns]

Picking `re.match()` vs `re.split()` is a non-trivial decision — smci, Jun 21 '13 at 22:08

score 8 · Answer 2 · edited Apr 04 '22 at 15:28

8

If you are looking for an output such as 0,1,2,3,4 etc.:

print re.findall('\d', s)

edited Apr 04 '22 at 15:28

vvvvv

25,404
19
49
81

answered Sep 05 '17 at 14:23

Rakesh kumar

609
8
20

score 8 · Answer 3 · answered Feb 10 '11 at 23:24

Another approach would be to use the regex you have to validate your data and then use a more specific regex that targets each item you wish to extract using a match iterator.

import re
s = '''Lun: 0 1 2 3 295 296 297 298'''
lun_validate_regex = re.compile(r'Lun:\s*((\d+)(\s\d+)*)')
match = lun_validate_regex.match(s)
if match:
    token_regex = re.compile(r"\d{1,3}")
    match_iterator = token_regex.finditer(match.group(1))
    for token_match in match_iterator:
        #do something brilliant

print re.findall('\d',s) – Rakesh kumar Sep 05 '17 at 14:21 — Rakesh kumar, Sep 05 '17 at 14:21

score 7 · Answer 4 · answered Feb 11 '11 at 00:07

7

Sometimes, its easier without regex.

>>> s = '''Lun: 0 1 2 3 295 296 297 298'''
>>> if "Lun: " in s:
...     items = s.replace("Lun: ","").split()
...     for n in items:
...        if n.isdigit():
...           print n
...
0
1
2
3
295
296
297
298

answered Feb 11 '11 at 00:07

kurumi

25,121
5
44
52

RegEx with multiple groups?

4 Answers4

Linked

Related