I'm trying to match and replace broken HTML using a regex, but I've done a couple of full circles with grouping and lookbacks and quantifiers. I'm struggling to match every scenario.
JavaScript, because the issue is triggered in a Web client browser HTML editor.
The broken HTML is specific - any text between a closing LI and the closing list UL or OL, that is not properly formed as a list item.
For instance, this piece here, from the greater example underneath:
</li>
bbb<strong>bbbb</strong><strong>bbb <span style="text-decoration: underline;"><em>bbbbb</em></span></strong>=0==
</ul>
Here is the full example of where the issue could exist:
<ul>
<li>1111</li>
<li>Could be anything here</li>
<li>aaaa</li>
bbb<strong>bbbb</strong><strong>bbb <span style="text-decoration: underline;"><em>bbbbb</em></span></strong>=0==
</ul>
<ol>
<li>more?<li>
<li>echo</li>
</ol>
This is what I intend the HTML to look like using a match + replace.
<ul>
<li>1111</li>
<li>Could be anything here</li>
<li>aaaabbb<strong>bbbb</strong><strong>bbb <span style="text-decoration: underline;"><em>bbbbb</em></span></strong>=0==
</ul>
<ol>
<li>more?<li>
<li>echo</li>
</ol>
A few expressions I've tried are the following, but depending on these (or slight variations), I'm matching too much or not correctly or something:
/<\/li>.*?<\/[ou]l>/mig
/<\/li>([\s\n]*[\w!\.?;,<:>&\\\-\{\}\[\]\(\)~#'"=/]+[\s\n]*)+<\/[ou]l>/mig
/<\/li>([\s\n]*[^\s\n]+[\s\n]*)+<\/[ou]l>/i
Searched for a couple of days on and off, no luck.. I realise I'm probably asking something answered hundreds of times before.