2

I am trying to find && using regex and substitute it with and using Python. Here is my regex:

r"(?=( && ))"

test input: x&& &&& && && x || | ||\|| x, expected output: x&& &&& and and x or | ||\|| x. My Python code:

import re

input = "x&& &&& && && x || | ||\|| x"
result = re.sub(r"(?=( && ))", " and ", input)
print(result)

My output is: x&& &&& and && and && x || | ||\|| x. This actually works, but instead of substitution it leaves the original pattern just adds my substitution string when it finds pattern. This is really confusing.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
AmacOS
  • 185
  • 11
  • `re.sub(r"(?<!\S)&&(?!\S)", "and", text)`? You only have a trailing space that overlaps. – Wiktor Stribiżew Jun 07 '20 at 20:43
  • overlap can be control by lookahead, but yuo can limit the overlap to a single end char i.e `[ ]&&(?=[ ])` replace ` and` this way endchar can be reuse (overlap) adjacent ` &&` if needed. of course you can get fancy and im so great stuiff afnd do assertunz back forward and upside down, but that is up to you –  Jun 07 '20 at 20:53
  • hundreds of overlapping regex examples on this site, did look a little, yes ? –  Jun 09 '20 at 21:45

2 Answers2

2

A capturing group inside a positive lookahead can be used to extract overlapping patterns. To replace them, you actually need to consume the text, lookarounds do not consume the text they match.

In this case, you only have an overlapping trailing space, so you might use either of the two approaches:

text = re.sub(r"( )&&(?= )", r"\1and", text)

Or, if you need to replace any && that is neither preceded nor followed with any whitespace char, use

text = re.sub(r"(?<!\S)&&(?!\S)", r"and", text)

Note that input is a Python builtin, you should name your text variable in a different way, say, text.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks a lot for quick reply. Just to clarify: re.sub(r"( )&&(?= )" means && should be followed by whitespace in order to match our pattern, but we are not going to consume this whitespace when substitution is done. How r"\1and" works? I assume ' &&' needs to be substituted by "and" right? – AmacOS Jun 07 '20 at 21:09
  • @AmacOS `( )` is a capturing group, `\1` backreference refers to this group value. If that is a regular space, you surely can just use `text = re.sub(r" &&(?= )", r" and", text)`. I provided a backreference solution in case you want to use `\s` instead of a literal space to match any whitespace. – Wiktor Stribiżew Jun 07 '20 at 21:10
0

Your output is consistent with your statement of the problem (sic):

I am trying to find " && " and substitute it with " and "

Since the output is not what you want there must be a problem with your statement of the problem. I believe you actually want:

I am trying to find " &&" followed by a space and substitute it with " and"

Once you get that right it's just a matter of translating it to a regular expression:

import re

s = 'x&& &&& && && x || | ||\|| x'
print re.sub(r' &&(?= )', ' and', s)
  #=> x&& &&& and and x || | ||\|| x

Demo

(?= ) is a positive lookahead that asserts that the match is followed by a space.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • See https://stackoverflow.com/questions/62251399/string-substitution-using-regex-in-python-with-overlapping-pattern#comment110097570_62251476 – Wiktor Stribiżew Jun 07 '20 at 21:28