8

Can Html Agility Pack be used to parse an html string fragment?

Such As:

var fragment = "<b>Some code </b>";

Then extract all <b> tags? All the examples I seen so far have been loading like html documents.

ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
chobo2
  • 83,322
  • 195
  • 530
  • 832
  • 1
    It could be done even simlier with HAP, in one line: `var text = HtmlNode.CreateNode("Some code ").InnerText;` – Oleks Mar 04 '12 at 15:31

3 Answers3

11

If it's html then yes.

string str = "<b>Some code</b>";
// not sure if needed
string html = string.Format("<html><head></head><body>{0}</body></html>", str);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

// look xpath tutorials for how to select elements
// select 1st <b> element
HtmlNode bNode = doc.DocumentNode.SelectSingleNode("b[1]");
string boldText = bNode.InnerText;
Mike Koder
  • 1,898
  • 1
  • 17
  • 27
2

I dont think this is really the best use of HtmlAgilityPack.

Normally I see people trying to parse large amounts of html using regular expressions and I point them towards HtmlAgilityPack but in this case I think it would be better to use a regex.

Roy Osherove has a blog post describing how you can strip out all the html from a snippet:

Even if you did get the correct xpath with Mika Kolari's sample this would only work for a snippet with a <b> tag in it and would break if the code changed.

rtpHarry
  • 13,019
  • 4
  • 43
  • 64
0

This answer came up when I searched for the same thing. I don't know if the features have changed since it was answered but this below should be better.

$string = '<b>Some code </b>'
[HtmlAgilityPack.HtmlNode]::CreateNode($string)