1

I have tried to research answers to this question online, but nothing seems to describe the problem I have here. If I missed something, please close the question and redirect it to where it has already been answered.

That being said, my python regex doesn't seem to want to recognize a pattern if it is already encompassed in another captured pattern. I tried to run the code and here are the results:

>>> import re
>>> string = 'NNTSY'
>>> m = re.findall('N[^P][ST][^P]',string)
>>> m
['NNTS']

I don't understand why it didn't yield this output:

>>> m
    ['NNTS','NTSY']

Thanks!

Robert Link
  • 367
  • 2
  • 9

2 Answers2

2

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

https://docs.python.org/3/library/re.html#re.findall

If you're not just trying to understand why, but actually need to get overlapping matches, you can use lookahead with a capturing group as described in this question's answers.

Amber
  • 507,862
  • 82
  • 626
  • 550
  • Thank you for pointing me in the right direction. I ended up using the answer in the redirected question and it worked beautifully! – Robert Link Feb 19 '17 at 20:33
1

This is in fact possible, using a lookahead assertion.

(?=pattern)

will match at any position directly followed by pattern without consuming the string, and

(?=(pattern))

will capture the group that matched.

import re
string = 'NNTSY'
m = re.findall(r'(?=(N[^P][ST][^P]))',string)
print(m)
#['NNTS', 'NTSY']
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50