41

I am attempting to use the HtmlAgilityPack library to parse some links in a page, but I am not seeing the results I would expect from the methods. In the following I have a HtmlNodeCollection of links. For each link I want to check if there is an image node and then parse its attributes but the SelectNodes and SelectSingleNode methods of linkNode seems to be searching the parent document not the childNodes of linkNode. What gives?

HtmlDocument htmldoc = new HtmlDocument();
htmldoc.LoadHtml(content);
HtmlNodeCollection linkNodes = htmldoc.DocumentNode.SelectNodes("//a[@href]");
    
foreach(HtmlNode linkNode in linkNodes)
{
    string linkTitle = linkNode.GetAttributeValue("title", string.Empty);
    if (linkTitle == string.Empty)
    {
        HtmlNode imageNode = linkNode.SelectSingleNode("/img[@alt]");     
    }
}

Is there any other way I could get the alt attribute of the image childnode of linkNode if it exists?

d219
  • 2,707
  • 5
  • 31
  • 36
Sheff
  • 3,474
  • 3
  • 33
  • 35

3 Answers3

44

You should remove the forwardslash prefix from "/img[@alt]" as it signifies that you want to start at the root of the document.

HtmlNode imageNode = linkNode.SelectSingleNode("img[@alt]");
Richard Szalay
  • 83,269
  • 19
  • 178
  • 237
  • 1
    Errrm OK. That was pretty daft of me. I thought I was missing something. Sorry for wasting question space Thanks. – Sheff May 13 '09 at 10:48
  • 3
    There's always plenty of space :) – Richard Szalay May 13 '09 at 10:54
  • 1
    You the man! A sec ago i was cursing at the HtmlAgility project, but turns out they just implemented xpath the right way :) – Moulde May 29 '12 at 14:08
  • This didnt work for me (HtmlAgilityPack 1.4.9) - I had to use the `.//` notation (answer below) – wal Feb 15 '15 at 23:39
  • 1
    @wal The syntax above assumes the target img is a direct child of `linkNode`. If you had to use `.//`, I'm guessing the img was a _descendant_ but not a direct child. – Richard Szalay Feb 15 '15 at 23:49
43

With an xpath query you can also use "." to indicate the search should start at the current node.

HtmlNode imageNode = linkNode.SelectSingleNode(".//img[@alt]");
ulty4life
  • 2,972
  • 1
  • 25
  • 31
10

Also, watch out for null checks; SelectNodes returns null instead of blank collection.

HtmlNodeCollection linkNodes = htmldoc.DocumentNode.SelectNodes("//a[@href]");

**if(linkNodes!=null)**
{
   foreach(HtmlNode linkNode in linkNodes)
  {
     string linkTitle = linkNode.GetAttributeValue("title", string.Empty);
     if (linkTitle == string.Empty)
     {
       **HtmlNode imageNode = linkNode.SelectSingleNode("img[@alt]");**   
     }
  }
}
d219
  • 2,707
  • 5
  • 31
  • 36
msqr
  • 296
  • 1
  • 2
  • 10
  • 4
    Which was a really stupid design decision IMO. There's no reason it *shouldn't* return an empty collection. – mpen Sep 08 '10 at 02:04
  • see also http://stackoverflow.com/questions/8619724/htmlagilitypack-documentnode-selectnodes-returns-null-shouldnt – Tim Abell Apr 24 '12 at 13:55