I need to match:
<p><span style="font-size: 18px;"><strong>Hello</strong></span></p>
I need to match the text hello between the last >
and the first </
Using (?=>)(.*?)(?=</)
returns <span style="font-size: 18px;"><strong>Hello
Thanks!
I need to match:
<p><span style="font-size: 18px;"><strong>Hello</strong></span></p>
I need to match the text hello between the last >
and the first </
Using (?=>)(.*?)(?=</)
returns <span style="font-size: 18px;"><strong>Hello
Thanks!
I know this is not the answer you were looking for but parsing html with regex is like eating soup with a fork. You'll get the job done eventually but it's very frustrating.
Try this instead and keep your sanity:
string html = "<p><span style=\"font-size: 18px;\"><strong>Hello</strong></span></p>";
System.Xml.Linq.XDocument doc = System.Xml.Linq.XDocument.Parse(html);
string hello = doc.Descendants().LastOrDefault().Value;
You could go with
/>([^<>]+)</
That should give you the desired match.
Do you only need to match this specific string? If yes, then you could simply use:
/<strong>([^<]*)</strong>/
which will match any text between the strong
tags.
Try this
The constant variable for regex is
const string HTML_TAG_PATTERN = "<.*?>";
The function
static string StripHTML(string inputString)
{
return Regex.Replace
(inputString, HTML_TAG_PATTERN, string.Empty);
}
and call the function like
string str = "<p><span style='font-size: 18px;'><strong>Hello</strong></span></p>";
str = StripHTML(str);
I think your first look ahead
must look more like : (?<=>)
(look behind
for >
)
And replace .*?
by [^<>]*
(anything but <
or >
).
If you need to keep your look around
you can do :
(?<=>)([^<>]*)(?=</)
If not, you can simply do : >([^<>]*)</
The difference is that using look around
you won't capture <
neither </
in the global match.