-2

I'm trying to parse the readelf output:

import re
o = '      EXIDX          0x000590 0x002c0590 0x002c0590 0x00008 0x00008 R   0x4'
re.findall(r'^ \s+ (\w+) \s+ (?:(0x [\da-f]+ )\s+)+', o, re.VERBOSE) # (1)
# [('EXIDX', '0x00008')]

Why does only one hexadecimal number gets captured? I expected

re.findall(r'^ \s+ (\w+) \s+ (?:(0x [\da-f]+ )\s+)+', o, re.VERBOSE) 
# [('EXIDX', '0x000590', '0x002c0590', '0x002c0590', '0x00008', '0x00008')]

When I'm trying this RE instead, it gives at least understandable result of matching only the first number:

re.findall(r'^ \s+ (\w+) \s+ (0x [\da-f]+ )\s+', oo, re.VERBOSE)
# [('EXIDX', '0x000590')]

I don't get why I get only the last (?) number with RE (1)

Michael Pankov
  • 3,581
  • 2
  • 23
  • 31

2 Answers2

1

Capturing groups do not multiply when matching multiple patterns. They only capture once, the last pattern to match in this case.

Capture all hexadecimal numbers, then split the result:

o = '      EXIDX          0x000590 0x002c0590 0x002c0590 0x00008 0x00008 R   0x4'
[[r[0]] + r[1].split() for r in re.findall(r'^ \s+ (\w+) \s+ ((?:0x [\da-f]+ \s+)*)', o, re.VERBOSE)]

outputs

[['EXIDX', '0x000590', '0x002c0590', '0x002c0590', '0x00008', '0x00008']]

The alternative would be to define 6 groups, one for the leading EXIDX pattern, and 1 each for the 5 hexadecimal patterns, but that would lock your pattern to 5 hexadecimal values instead of a variable number.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0

Okay, in the end I discovered that I need to grab all the numbers into one group and then split it (thanks to this question and Martijn Pieters)

The correct code is

r = re.findall(r'^ \s+ (\w+) \s+ ((?:0x [\da-f]+ \s+)*)', oo, re.VERBOSE)
numbers = r[0][1].split()
Community
  • 1
  • 1
Michael Pankov
  • 3,581
  • 2
  • 23
  • 31