-1

I have a string that has HTML mark ups like below

results[1] = <div class="ExternalClassAE850B41EF"><p>​<span>G_Pck</span></p></div>

I want to extract the mark ups and just store G_Pc. I did use the Regex Function like below

string StTag = results[i].ToString();
var b = Regex.Match(StTag, "(?<=>)(.*)(?=<)");

But still I see the the span and p tags in the b. How can I escape those tags too

Sach
  • 10,091
  • 8
  • 47
  • 84
trx
  • 2,077
  • 9
  • 48
  • 97
  • 2
    Use System.Net.WebUtility.HtmlEncode(string) and System.Net.WebUtility.HtmlDecode(string) – jdweng Aug 24 '18 at 17:18
  • Why don't you replace all <[^>]*> with an empty string? – pepak Aug 24 '18 at 17:18
  • 1
    [HTML Agility Pack](http://html-agility-pack.net/?z=codeplex) should probably help. – Sach Aug 24 '18 at 17:21
  • If you can guarantee that string conforms to a well-defined Xml Fragment you could XDocument.Parse it and then use `Descendants("span").First().Value` to get that value you're after. Otherwise use an html parser. Better not [regex too much...](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – rene Aug 24 '18 at 17:25
  • @jdweng tried to use the `System.Net.WebUtility.HtmlDecode` like string `StTag = System.Net.WebUtility.HtmlDecode(results[i].ToString());` still seeing the same issue – trx Aug 24 '18 at 17:38
  • @Sach I installed HTMLAgility but when I use `HtmlDocument` itsays missing assembly reference – trx Aug 24 '18 at 17:40
  • Which assembly reference? You can use NuGet to install it without missing anything. – Sach Aug 24 '18 at 19:23
  • You have double quotes in the string so make sure you put a backslash before the two double quotes. – jdweng Aug 24 '18 at 19:50

1 Answers1

0

You can use a simple regex like this:

public static string StripHTML(string input)
{
  return Regex.Replace(input, "<.*?>", String.Empty);
}

Be aware that this solution has its own flaw. See Answered question

Di Kamal
  • 173
  • 13