Regex/Python missed encompassed pattern

Question

I have tried to research answers to this question online, but nothing seems to describe the problem I have here. If I missed something, please close the question and redirect it to where it has already been answered.

That being said, my python regex doesn't seem to want to recognize a pattern if it is already encompassed in another captured pattern. I tried to run the code and here are the results:

>>> import re
>>> string = 'NNTSY'
>>> m = re.findall('N[^P][ST][^P]',string)
>>> m
['NNTS']

I don't understand why it didn't yield this output:

>>> m
    ['NNTS','NTSY']

Thanks!

score 2 · Accepted Answer · answered Feb 19 '17 at 20:17

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

https://docs.python.org/3/library/re.html#re.findall

If you're not just trying to understand why, but actually need to get overlapping matches, you can use lookahead with a capturing group as described in this question's answers.

Thank you for pointing me in the right direction. I ended up using the answer in the redirected question and it worked beautifully! — Robert Link, Feb 19 '17 at 20:33

Thierry Lathuille · Answer 2 · 2017-02-19T20:35:49.623

1

This is in fact possible, using a lookahead assertion.

(?=pattern)

will match at any position directly followed by pattern without consuming the string, and

(?=(pattern))

will capture the group that matched.

import re
string = 'NNTSY'
m = re.findall(r'(?=(N[^P][ST][^P]))',string)
print(m)
#['NNTS', 'NTSY']

edited Feb 19 '17 at 20:35

answered Feb 19 '17 at 20:30

Thierry Lathuille

23,663
10
44
50

Regex/Python missed encompassed pattern

2 Answers2

`re.findall(pattern, string, flags=0)`