2

I am looking for a very particular RegEx (or another solution, close in performance) in Python to substitute patterns, which are in the following examples:

...-1AG.,., should be transformed as ...G.,.,
..,-1A,.,., should be transformed as ..,,.,.,
...-2GTC,., should be transformed as ...C,.,
..,-2GT.,., should be transformed as ..,.,.,
...+3TAGT,, should be transformed as ...T,,
..,+3TAG.,. should be transformed as ..,.,.

Basically:

AnySymbol (not only dots and commas), followed by one +/- sign, followed by one letter digit (1..9), followed by several letters, the number of which is dependent on the previous number and finally AnySymbol (not only dots and commas),

should be transformed to:

AnySymbol (not only dots and commas) and AnySymbol (not only dots and commas).

Obviously the solution: String = re.sub(r'[\-\+]\d\w+', "", String) is not right, if we have case (...-1AG.,., should be transformed as ...G.,.,). So far I am looping over r'[\-\+]1\w', r'[\-\+]2\w\w', r'[\-\+]3\w\w\w' ... r'[\-\+]9\w\w\w\w\w\w\w\w\w', however I am hoping for more elegant solution. Any ideas?

Ivaylo
  • 2,082
  • 1
  • 13
  • 13

1 Answers1

3

Have a look at this working demo.

x="""...-1AG.,., should be transformed as ...G.,.,
..,-1A,.,., should be transformed as ..,,.,.,
...-2GTC,., should be transformed as ...C,.,
..,-2GT.,., should be transformed as ..,.,.,
...+3TAGT,, should be transformed as ...T,,
..,+3TAG.,. should be transformed as ..,.,."""

def repl(matchobj):
    return matchobj.group(2)[int(matchobj.group(1)):]

print re.sub(r"[+-](\d+)([a-zA-Z]+)",repl,x)

You can use your own function in re.sub to make customized replacements.

vks
  • 67,027
  • 10
  • 91
  • 124
  • This solution is fairly elegant, I will try to benchmark it soon against the loop, but something tells me it will perform better. Thank you for the valuable input. – Ivaylo Sep 17 '15 at 10:28