How can you get overlapping matches in regex?

Question

If I run

it returns:

[('groupone|grouptwo', 'groupone', '|', 'grouptwo'), ('groupthree|groupfour', 'groupthree', '|', 'groupfour')]

This is not my desired result. I would also like grouptwo and groupthree to be matched, like this:

What do I need to correct about my regex to make this possible?

With normal `re` by capturing inside a lookahead, eg: [`(?<![^|])(?=(([^\W_]+)([&|])([^\W_]+)))`](https://regex101.com/r/BdTHba/1) — bobble bubble, Jul 09 '22 at 13:40

score 1 · Accepted Answer · answered Jul 09 '22 at 12:54

You could use the third-party regex module for this. Unlike the standard library re, it supports overlapping matches.

import regex

regex.findall(r"(\b([a-zA-Z]+\b)(&|\|)(\b[a-zA-Z]+)\b)", "groupone|grouptwo|groupthree|groupfour", overlapped=True)

[('groupone|grouptwo', 'groupone', '|', 'grouptwo'),
 ('grouptwo|groupthree', 'grouptwo', '|', 'groupthree'),
 ('groupthree|groupfour', 'groupthree', '|', 'groupfour')]

N.B. please note the addition of word boundaries (\b) in the pattern. If you were to keep your original pattern, you would get a bunch of unwanted matches as well using this method.

Was working on the same, came up with `regex.findall(r'((\b[a-zA-Z]+\b)([&|])((?2)))'` — JvdV, Jul 09 '22 at 13:08

How can you get overlapping matches in regex?

1 Answers1