Possible Duplicate:
Replace all < and > that are NOT part of an HTML tag
- Using Python
- I know how much everyone here hates REGEX questions surrounding HTML tags, but I am just doing this as a exercise to help my learn REGEX.
Replace (1 can be any character):
<b>< </b>
<b> < </b>
<b> <</b>
<b><</b>
<b><111</b>
<b>11<11</b>
<b>111<</b>
<b>11<11</b>
<b>
<<<
</b>
With:
<b>& </b>
<b> & </b>
<b> &</b>
<b>&</b>
<b>&111</b>
<b>11&11</b>
<b>111&</b>
<b>11&11</b>
<b>
&
</b>
I am searched in the interwebs and tried many of my own solutions. Please, is this possible? And if so, how?
My best guess was something like:
re.sub(r'(?<=>)(.*?)<(.*?)(?=</)', r'\1<\2', string)
But that falls apart with re.DOTALL and '<<<'+ etc.