1

I don't understand why this regex only returns the last match:

import re
text = """
Chicken chicken chicken chicken chicken chicken.

#=================
# @title   Chicken
# @author  Me
#=================
Chicken chicken chicken.
"""

rx = r"#=+\n(?:#\s*@(\w+)\s+(.*)\n)+#=+"
for match in re.finditer( rx, text ):
    print match.groups()

# Output:
# ('author', 'Me')

I would expect this regex to return [ ('title', 'Chicken'), ('author', 'Me') ], but it seems to only return the last match. This does not change if I set the flag re.M (multiline), and the flag re.DOTALL is not what I intend here.

For clarity, you can visualise the regex here, it seems to be what I intended, namely:

  • From the first comment line #===...
  • Find and capture the next lines with the format # @(word) (anything)
Jonathan H
  • 7,591
  • 5
  • 47
  • 80
  • The regex does not return overlapping matches. – Dietrich Epp Sep 17 '17 at 21:35
  • @DietrichEpp But the non-capturing group (`(?: ...)`) is not overlapping here, is it? – Jonathan H Sep 17 '17 at 21:36
  • 1
    check https://stackoverflow.com/questions/5060659/python-regexes-how-to-access-multiple-matches-of-a-group – Ben Sep 17 '17 at 21:38
  • Try: `rx = r"(?:#\s*@(\w+)\s+(.*?)\n)"` – cs95 Sep 17 '17 at 21:39
  • 1
    The grouping doesn't matter--what matters is the whole regex. Because the regex starts with `#=\n` and ends with `#=\n`, it can only match once between the `#==` and `#==` lines. – Dietrich Epp Sep 17 '17 at 21:42
  • @DietrichEpp Well, I understand the problem, but does that mean I cannot do this with regexes? The suggestion of @coldspeed does not achieve the same thing; is there a way to enforce the constraint that matching lines should be just below the line `#===`? – Jonathan H Sep 17 '17 at 21:54
  • Right, apparently this is a [known issue](https://bugs.python.org/issue7132) for which a fix has been deemed unnecessary. – Jonathan H Sep 17 '17 at 21:59
  • I managed to do it with two regexes instead. The first one `rx1 = r"#=+(.*?)#=+"` used with the flag `re.DOTALL` extracts the blocks of comments fenced by `#==` lines. The second one `rx2 = r"(?:#\s*@(\w+)\s+(.+))"` matches individual lines within each block. – Jonathan H Sep 17 '17 at 22:15
  • Actually, I think the comment by @n611x007 in [this post](https://stackoverflow.com/q/4963691/472610) is a "better" duplicate. – Jonathan H Sep 17 '17 at 22:18

0 Answers0