Python: Overlapping regex search

Question

So if I create a program in python (3.7) that looks like this:

import re
regx = re.compile("test")
print(regx.findall("testest"))

and run it, then I will get:

["test"]

Even though there are two instances of "test" it's only showing me one which I think is because a letter from the first "test" is being used in the second "test". How can I make a program that will give me ["test", "test"] as a result instead?

Yeah, that question helped me a lot, thanks! – user8969265 Nov 15 '18 at 20:18 — user8969265, Nov 15 '18 at 20:18

score 5 · Answer 1 · answered Nov 14 '18 at 23:37

5

You will want to use a capturing group with a lookahead (?=(regex_here)):

import re
regx = re.compile("(?=(test))")
print(regx.findall("testest"))

>>> ['test', 'test']

answered Nov 14 '18 at 23:37

Spencer Wieczorek

21,229
7
44
54

score -1 · Answer 2 · answered Nov 14 '18 at 23:56

Regex expressions are greedy. They consume as much of the target string as possible. Once consumed, a character is not examined again, so overlapping patterns are not found.

To do this you need to use a feature of python regular expressions called a look ahead assertion. You will look for instances of the character t where it is followed by est. The look ahead does not consume parts of the string.

    import re

    regx = re.compile('t(?=est)')

    print([m.start() for m in regx.finditer('testest')])

[0,3]

More details on this page: https://docs.python.org/3/howto/regex.html

Python: Overlapping regex search

2 Answers2