1

I'm new to python, coming from a basic knowledge of perl. I'm trying to capture a substring with regex.

>>> a='Question 73 of 2943'
>>> import re
>>> re.match("Question.*(\d+)\s+of", a).group(0)
'Question 73 of'
>>> re.match("Question.*(\d+)\s+of", a).group(1)
'3'

What I wanted to do was to catch 73 in the group. I assumed that the parenthesis would do that.

Joel G Mathew
  • 7,561
  • 15
  • 54
  • 86
  • 2
    Operator `*` is _greedy_. Use `*?` instead. Or, better yet, insert a `\s` in the regex before the number. – DYZ Apr 16 '18 at 05:09

3 Answers3

1

.* is greedy. What this means is it will continue to match any character (except for line terminators) 0 or more times. That means the (\d+) capture group you have set up will never happen. What you can do is make the .* part lazy by adding a ? so your regex would look like...

re.match(r"Question.*?(\d+)\s+of", a)

The difference between lazy and greedy regex is well explained here

rsiemens
  • 615
  • 6
  • 15
0

If you would like to capture 73 only, you can do re.search(r'\d+', a).group() which stops searching for a match after finding the first match.

0

Your .* part will capture any character included a digit. Better to use except.

Question[^\d]*(\d+)\s+of

that should give you 73

digitake
  • 846
  • 7
  • 16