I am trying to replace a regex group's surroundings. I want to replace QQQQQ and SSSSS by LLL and MMM, with the stuff in the middle, before and after staying the same. (There may be several occurrences of QQQQQ and SSSSS).
In the code below, (1) seems to show .*?
can find the right string.
But in (2), using (.*?)
as a group also finds the right string, but gets a 0
in the replacement.
In (3) and (4), the DOTALL doesn't find anything string.
I'm using regex here, but it's the same with re. I also tried $1 instead of \1
Here the code:
doc1 = """AAA QQQQQ azertyuiop SSSSS BBB"""
doc2 = """
AAA
QQQQQ
azertyuiop
SSSSS
BBB
"""
# (1) OK - gives AAA LLL dd MMM BBB. .*? finds the right string
doc = regex.sub("QQQQQ.*?SSSSS", "LLL dd MMM", doc1)
print(doc)
# (2) gives AAA LLL ☺ MMM BBB - where does this ☺ come from?
doc = regex.sub("QQQQQ(.*?)SSSSS", "LLL \1 MMM", doc1)
print(doc)
# (3) leaves string unchanged. Isn't DOTALL supposed to match line breaks?
doc = regex.sub("QQQQQ.*?SSSSS", "LLL dd MMM", doc2, regex.DOTALL)
print(doc)
# (4) leaves string unchanged
doc = regex.sub("QQQQQ(.*?)SSSSS", "LLL \1 MMM", doc2, regex.DOTALL)
print (doc) # leaves unchanged
(4) is what I am attempting to do