11

I'm trying to use HTML Agility Pack to get the description text from inside the:

<meta name="description" content="**this is the text i want to extract and store in a string**" />

And someone on Stackoverflow a little while ago suggested I use HTMLAgilityPack. But I don't know how to use it, and the documentation for it that I've found (including the docs contained in the downloads) all have invalid links and therefor cannot view the documentation.

Can somebody please help me solve this?

jay_t55
  • 11,362
  • 28
  • 103
  • 174
  • 1
    I'm confused - have you built HtmlAgilityPack or not? Is it the examples that don't build? or the core dll? If the latter, what have you referenced? – Marc Gravell Dec 10 '09 at 21:27
  • thanks Marc. I've edited my question and removed that part to avoid further confusion. Thinking about it now, that specific part wasn't really relative to my question, rather abit of info to say why i'm asking the question. – jay_t55 Dec 10 '09 at 21:41
  • i have been able to add a reference in my app to the dll file. so i can "use" htmlagility pack. – jay_t55 Dec 10 '09 at 21:56
  • 1
    See this duplicate question: http://stackoverflow.com/questions/846994/how-to-use-html-agility-pack – Ash Jan 22 '10 at 09:32
  • 1
    Just for reference (since this question appears as the first Google result for the `GetAttributeValue` method), the second argument is the default value to return in case the attribute is not found. Here's the complete definition of the method: /// /// Helper method to get the value of an attribute of this node. If the attribute is not found, the default value will be returned. /// /// The name of the attribute to get. May not be null. /// The default value to return if not found. /// The value of the att – alf Sep 06 '11 at 01:35

2 Answers2

18

The usage is very similar to XmlDocument; you could use MSDN on XmlDocument for a broad overview; you might also want to learn xpath syntax (MSDN).

Example:

HtmlDocument doc = new HtmlDocument();
doc.Load(path); // or .LoadHtml(html);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//meta[@name='description']");
if (node != null) {
    string desc = node.GetAttributeValue("content", "");
    // TODO: write desc somewhere
}

The second argument to GetAttributeValue is the default returned in case the attribute is not found.

patridge
  • 26,385
  • 18
  • 89
  • 135
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • What is the second string argument (the empty one) used for in `node.GetAttributeValue("content", "");`? – Alex Aug 19 '10 at 00:05
  • @AlexW - I don't have that library "to hand" at the moment; what is the parameter called? – Marc Gravell Aug 19 '10 at 08:06
  • Not sure on the parameter name... Will follow definition path later to find out. Thanks for answer here, v useful. – Alex Aug 19 '10 at 11:45
  • 3
    "def" stands for default. It is the value to return if the attribute is not found. Commenting here because this is the top result when googling the answer. – Brian Sep 27 '13 at 19:04
0

public string HtmlAgi(string url, string key) {

    var Webget = new HtmlWeb();
    var doc = Webget.Load(url);
    HtmlNode ourNode = doc.DocumentNode.SelectSingleNode(string.Format("//meta[@name='{0}']", key));

    if (ourNode != null)
    {


            return ourNode.GetAttributeValue("content", "");

    }
    else
    {
        return "not fount";
    }

}
fvmitnick
  • 19
  • 4