Why does this regular expression not capture requested groups?

Question

I'm trying to parse the readelf output:

import re
o = '      EXIDX          0x000590 0x002c0590 0x002c0590 0x00008 0x00008 R   0x4'
re.findall(r'^ \s+ (\w+) \s+ (?:(0x [\da-f]+ )\s+)+', o, re.VERBOSE) # (1)
# [('EXIDX', '0x00008')]

Why does only one hexadecimal number gets captured? I expected

re.findall(r'^ \s+ (\w+) \s+ (?:(0x [\da-f]+ )\s+)+', o, re.VERBOSE) 
# [('EXIDX', '0x000590', '0x002c0590', '0x002c0590', '0x00008', '0x00008')]

When I'm trying this RE instead, it gives at least understandable result of matching only the first number:

re.findall(r'^ \s+ (\w+) \s+ (0x [\da-f]+ )\s+', oo, re.VERBOSE)
# [('EXIDX', '0x000590')]

I don't get why I get only the last (?) number with RE (1)

Martijn Pieters · Answer 1 · 2013-08-12T09:53:40.413

Capturing groups do not multiply when matching multiple patterns. They only capture once, the last pattern to match in this case.

Capture all hexadecimal numbers, then split the result:

o = '      EXIDX          0x000590 0x002c0590 0x002c0590 0x00008 0x00008 R   0x4'
[[r[0]] + r[1].split() for r in re.findall(r'^ \s+ (\w+) \s+ ((?:0x [\da-f]+ \s+)*)', o, re.VERBOSE)]

outputs

[['EXIDX', '0x000590', '0x002c0590', '0x002c0590', '0x00008', '0x00008']]

The alternative would be to define 6 groups, one for the leading EXIDX pattern, and 1 each for the 5 hexadecimal patterns, but that would lock your pattern to 5 hexadecimal values instead of a variable number.

Your RE only captures last number with a space – Michael Pankov Aug 12 '13 at 09:41 — Michael Pankov, Aug 12 '13 at 09:41

score 0 · Accepted Answer · edited May 23 '17 at 11:56

0

Okay, in the end I discovered that I need to grab all the numbers into one group and then split it (thanks to this question and Martijn Pieters)

The correct code is

r = re.findall(r'^ \s+ (\w+) \s+ ((?:0x [\da-f]+ \s+)*)', oo, re.VERBOSE)
numbers = r[0][1].split()

edited May 23 '17 at 11:56

Community

1
1

answered Aug 12 '13 at 09:45

Michael Pankov

3,581
2
23
31

Why does this regular expression not capture requested groups?

2 Answers2