0

I need to match:

<p><span style="font-size: 18px;"><strong>Hello</strong></span></p>

I need to match the text hello between the last > and the first </

Using (?=>)(.*?)(?=</) returns <span style="font-size: 18px;"><strong>Hello

Thanks!

Dimo
  • 3,238
  • 6
  • 29
  • 46

5 Answers5

2

I know this is not the answer you were looking for but parsing html with regex is like eating soup with a fork. You'll get the job done eventually but it's very frustrating.

Try this instead and keep your sanity:

string html = "<p><span style=\"font-size: 18px;\"><strong>Hello</strong></span></p>";
System.Xml.Linq.XDocument doc = System.Xml.Linq.XDocument.Parse(html);
string hello = doc.Descendants().LastOrDefault().Value;
Ovidiu
  • 1,407
  • 12
  • 11
1

You could go with

/>([^<>]+)</

That should give you the desired match.

Vince
  • 1,517
  • 2
  • 18
  • 43
0

Do you only need to match this specific string? If yes, then you could simply use:

/<strong>([^<]*)</strong>/

which will match any text between the strong tags.

Fabian
  • 318
  • 1
  • 10
0

Try this

The constant variable for regex is

const string HTML_TAG_PATTERN = "<.*?>";

The function

 static string StripHTML(string inputString)
        {
            return Regex.Replace
              (inputString, HTML_TAG_PATTERN, string.Empty);
        }

and call the function like

string str = "<p><span style='font-size: 18px;'><strong>Hello</strong></span></p>";

str = StripHTML(str);
nrsharma
  • 2,532
  • 3
  • 20
  • 36
0

I think your first look ahead must look more like : (?<=>) (look behind for >)

And replace .*? by [^<>]* (anything but < or >).

If you need to keep your look around you can do : (?<=>)([^<>]*)(?=</)

If not, you can simply do : >([^<>]*)</

The difference is that using look around you won't capture < neither </ in the global match.

polkduran
  • 2,533
  • 24
  • 34