3

Tryto use negative forward to replace all string which does not match a pattern:

regexPattern = '((?!*' + 'word1|word2|word3' + ').)*$'  
mytext= 'jsdjsqd word1dsqsqsword2fjsdjswrod3sqdq'
return re.sub(regexPattern, "P", mytext)

#Expected Correct Output:  'PPPPPPword1PPPPPPword2PPPPPword3PPP'

#BAD Output:  'jsdjsqd word1dsqsqsword2fjsdjswrod3sqdq'

I try this but it does not work (string remains same). How to modify it ? (think this is pretty difficult regex)

  • 1
    post a sample data along with expected output. – Avinash Raj Mar 30 '16 at 12:46
  • remove `*` in `'((?!*'` – Avinash Raj Mar 30 '16 at 12:46
  • 1
    You want to replace all strings that do not contain `word1` or `word2` or `word3`? [`r'(?s)^(?!.*(?:word1|word2|word3)).*$'`](https://regex101.com/r/tQ5tF3/1). *it does not work well* - how does it work for you, what is the problem? – Wiktor Stribiżew Mar 30 '16 at 12:48
  • The code above throws a well known [*nothing to repeat* error](http://stackoverflow.com/questions/3675144/regex-error-nothing-to-repeat). – Wiktor Stribiżew Mar 30 '16 at 13:04
  • Your code submitted seems not working too. Any idea ? –  Mar 30 '16 at 13:07
  • @quantCode: What code? What does not work? The `My text with word1` string contains `word1` and thus is not matched. See [this demo](https://ideone.com/FVb2Gh). `r'(?s)^(?!.*(?:word1|word2|word3)).*$'` matches any string that has no `word1`, `word2` or `word3` in it. – Wiktor Stribiżew Mar 30 '16 at 13:08
  • r'(?s)^(?!.*(?:word1|word2|word3)).*$' this one, Be careful , this is for substiution (not search, also this is in Python (so regex are different). –  Mar 30 '16 at 13:09
  • Your code r'(?s)^(?!.*(?:word1|word2|word3)).*$' does not work. It returns "jsdjsqd word1dsqsqsword2fjsdjswrod3sqdq" . Nothing was replaced. –  Mar 30 '16 at 13:15
  • You should have added the requirement: *any character should be replaced with `P` and the `word1`, `word2`, and `word3` character sequences should remain intact*. – Wiktor Stribiżew Mar 30 '16 at 13:19

2 Answers2

3

You can use

import re
regex = re.compile(r'(word1|word2|word3)|.', re.S)
mytext = 'jsdjsqd word1dsqsqsword2fjsdjsword3sqdq'
print(regex.sub(lambda m: m.group(1) if m.group(1) else "P", mytext))
// => PPPPPPPPword1PPPPPPword2PPPPPPword3PPPP

See the IDEONE demo

The regex is (word1|word2|word3)|.:

  • (word1|word2|word3) - either word1, or word2, or word3 character sequences
  • | - or...
  • . - any character (incl. a newline as re.S DOTALL mode is on)

See the regex demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Ok, If we want this output: 'Pword1Pword2Pword3P' , do we nee to do another step ? –  Mar 30 '16 at 13:44
  • You can use a tempered greedy token like you have: [`re.compile(r'(word1|word2|word3)|(?:(?!word1|word2|word3).)*', re.S)`](https://ideone.com/Xpm1Pn) – Wiktor Stribiżew Mar 30 '16 at 13:47
0

You could use a two-stage approach: First, replace the characters that do match with some special character, then use that as a mask to replace all the other characters.

>>> text= 'jsdjsqd word1dsqsqsword2fjsdjsword3sqdq'
>>> p = 'word1|word2|word3'
>>> mask = re.sub(p, lambda m: 'X' * len(m.group()), text)
>>> mask
'jsdjsqd XXXXXdsqsqsXXXXXfjsdjsword3sqdq'
>>> ''.join(t if m == 'X' else 'P' for (t, m) in zip(text, mask))
'PPPPPPPPword1PPPPPPword2PPPPPPword3PPPP'

Of course, instead of X you might have to choose a different character, that does not occur in the original string.

tobias_k
  • 81,265
  • 12
  • 120
  • 179