I need to replace some text in C# using RegEx:
string strSText = "<P>Bulleted list</P><UL><P><LI>Bullet 1</LI><P></P><P>
<LI>Bullet 2</LI><P></P><P><LI>Bullet 3</LI><P></UL>"
Basically I need to get rid of the
"<P>"
tag(s) introduced between
"<UL><P><LI>",
"</LI><P></P><P><LI>" and
"</LI><P></UL>"
I also need to ignore any spaces between these tags when performing the removal.
So
"</LI><P></P><P><LI>", "</LI> <P></P><P><LI>", "</LI><P></P><P> <LI>" or
"</LI> <P> </P> <P> <LI>"
must all be replaced with
"</LI><LI>"
I tried using the following RegEx match for this purpose:
strSText = Regex.Replace(strSText, "<UL>.*<LI>", "<UL><LI>", RegexOptions.IgnoreCase);
strSText = Regex.Replace(strSText, "</LI>.*<LI>", "</LI><LI>",
RegexOptions.IgnoreCase);
strSText = Regex.Replace(strSText, "</LI>.*</UL>", "</LI></UL>",
RegexOptions.IgnoreCase);
But it performs a "greedy" match and results in:
"<P>Bulleted list</P><UL><LI>Bullet 3</LI></UL>"
I then tried using "lazy" match:
strSText = Regex.Replace(strSText, "<UL>.*?<LI>", "<UL><LI>", RegexOptions.IgnoreCase);
strSText = Regex.Replace(strSText, "</LI>.*?<LI>", "</LI><LI>",
RegexOptions.IgnoreCase);
strSText = Regex.Replace(strSText, "</LI>.*?</UL>", "</LI></UL>",
RegexOptions.IgnoreCase);
and this results in:
"<P>Bulleted list</P><UL><LI>Bullet 1</LI></UL>"
But I want the following result, which preserves all other data:
"<P>Bulleted list</P><UL><LI>Bullet 1</LI><LI>Bullet 2</LI><LI>Bullet 3</LI></UL>"
- ", "
- ");` etc... work?
– DGibbs Sep 11 '13 at 08:17