I'm working on a small assignment that requires the use of regular expressions with HTML strings. My current problem is properly obtaining strings enclosed within HTML tags.
For instance:
I have a string
<p><Placeholder></p>
I've been able to obtain the contents with the following regex
private string Unescape(){
string s = WebUtility.HtmlDecode("<p><Placeholder></p>");
string dec = Regex.Replace(s, "^<.*?>|^<.*?><.*?>", "");
return Regex.Replace(dec, "</.*?>$|</.*?></.*?>$", "");
}
Which would return:
<Placeholder>
However, should the string contain an additional HTML tag, e.g.:
<p><strong>Placeholder</strong></p>
I would get this
<strong>Placeholder
It appears I'm only able to successfully remove the closing tag(s), but I can't do the same with the opening tag(s). Could anybody tell me where I've gone wrong?
EDIT:
To summarize, is there a way for me to treat the string enclosed within HTML tags as literal? To cover the possibility that the string could contain special characters (e.g. > <)