I have a string like this:
import re
text = """
Some stuff to keep <b>here</b>
CODE
<b>Replace gt and lt</b>
<i>inside <script>this</script> code</i>
CODE
Some more stuff to keep <b>here</b>
"""
And the expected output is:
Some stuff to keep <b>here</b>
CODE
_LT_b_GT_Replace gt and lt_LT_/b_GT_
_LT_i_GT_inside _LT_script_GT_this_LT_/script_GT_ code_LT_/i_GT_
CODE
Some more stuff to keep <b>here</b>
Here's a small subset of what I've tried:
# None of these work, and typically only replace the first or last occurence of <
re.sub(r'(?<=CODE)<(?=CODE)', r'_LT_', text, flags=re.DOTALL)
re.sub(r'(?<=CODE)(.*?)<(.*?)(?=CODE)', r'\1_LT_\2', text, flags=re.DOTALL)
re.sub(r'(?<=CODE)(.*?)[<]*(.*?)(?=CODE)', r'\1_LT_\2', text, flags=re.DOTALL|re.MULTILINE)
re.sub(r'(CODE.*?)<(.*?CODE)', r'\1_LT_\2', text, flags=re.DOTALL)
re.sub(r'(CODE.*)<(.*CODE)', r'\1_LT_\2', text, flags=re.DOTALL)
What I'd like to happen: All occurrences of <
between CODE
and CODE
to be replaced with _LT_
.
After spending the day on stackoverflow and regex101.com, I'm starting to think either it's not possible or I'm not smart enough to handle this.
Any help is tremendously appreciated!
Thanks in advance.