5

I want to find all possible substrings inside a string with the following requirement: The substring starts with N, the next letter is anything but P, and the next letter is S or T

With the test string "NNSTL", I would like to get as results "NNS" and "NST"

Is this possible with Regex?

Rodrigo Villalba Zayas
  • 5,326
  • 2
  • 23
  • 36
  • 2
    Check this out my friend: http://stackoverflow.com/questions/22030428/match-regex-pattern-within-pattern and this http://stackoverflow.com/questions/11430863/how-to-find-overlapping-matches-with-a-regexp/11430936#11430936 Hope it helps – Juan Ignacio Mignaco Fernández Feb 26 '14 at 02:39

3 Answers3

4

Try the following regex:

N[^P\W\d_][ST]

The first character is N, the next character is none of (^) P, a non-letter (\W), a digit (\d) or underscore (_). The last letter is either S or T. I'm assuming the second character must be a letter.

EDIT

The above regex will only match the first instance in the string "NNSTL" because it will then start the next potential match at position 3: "TL". If you truly want both results at the same time use the following:

(?=(N[^P\W\d_][ST])).

The substring will be in group 1 instead of the whole pattern match which will only be the first character.

CJ Dennis
  • 4,226
  • 2
  • 40
  • 69
2

You can do this with the re module:

import re

Here's a possible search string:

my_txt = 'NfT foo NxS bar baz NPT'

So we use the regular expression that first looks for an N, any character other than a P, and a character that is either an S or a T.

regex = 'N[^P][ST]'

and using re.findall:

found = re.findall(regex, my_txt)

and found returns:

['NfT', 'NxS']
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
1

Yes. The regex snippet is: "N[^P][ST]"

Plug it in to any regex module methods from here: http://docs.python.org/2/library/re.html

Explanation:

  • N matches a literal "N".
  • [^P] is a set, where the caret ("^") denotes inverse (so, it matches anything not in the set.
  • [ST] is another set, where it matches either an "S" or a "T".
ethguo
  • 180
  • 11