I have html that looks like so:
<ul style="list-style-type: square;">
<br />
<li margin-left="80px">
<br />first line
<br />
<br />second line
</li>
<br />
<li margin-left="80px">
<br />text line 1
</li>
<br />
<li margin-left="80px">
<br />text line 2
</li>
<br />
</ul>
I want to match contents of the ul, but I don't want to match contents of the li elements
The end goal is to get rid of the <br />
tags that are directly under the <ul></ul>
and not under the <li></li>
Note:For clarity of the example I did formate the above html, but in my real world scenario it comes as a single giant string without any /r/n's
here:
<p margin-left="40px"><br /> <b>[What is the nature of the Services?]</b></p><br /><p><br /> [What are the overarching goals, objectives and outcomes you want to achieve?]</p><br /><p margin-left="80px"><br /> <b><i><u>[How should the Services be delivered?]</u></i></b></p><br /><ul style="list-style-type: square;"><br /> <li margin-left="80px"><br /> gfhsdfsdf<br /><br /> some line here</li><br /> <li margin-left="80px"><br /> sfdsfsdfsdf</li><br /> <li margin-left="80px"><br /> sdfsdfsdf</li><br /></ul><br /><p><br /> [Is the appointment of this Supplier exclusive?]</p><br /><p><br /> [Refer to any proposal prepared by the Supplier if this helps describes any aspects of the Service]</p><br />
Anyway the first thing in my mind was to
use this to extract the contents of the <ul>
<ul[^>]*>(.*)</ul>
and then maybe do a subsequent one to select all the li
<li[^>]*>.*</li>
and then somehow get rid of anything else that's left over
but that's kind of lame and then again
<li[^>]*>.*</li>
matches whole bunch of li's
this entrie string gets captured:
<li margin-left="80px"><br />\t\tgfhsdfsdf<br /><br />\t\tsome line here</li><br />\t<li margin-left="80px"><br />\t\tsfdsfsdfsdf</li><br />\t<li margin-left="80px"><br />\t\tsdfsdfsdf</li>
i know it's because dot is greedy, but not sure how to avoid it
something like [^</li>]*
wouldn't work cuz it treats it like list of characters not a string
any help much appreciated
So I have 2 problems 1) i don't like the way I'm approaching this - better ideas needed (I'm considering using set operations of linq to xml to achieve this) - still hope to do this with regex, but if anyone knows exactly how to do this then please share
2) how do I capture separate groups of lis instead of capturing entire first opening <li>
and last closing </li>
?