18

I want to get a value of an attribute by HtmlAgilityPack. Html code:

<link href="style.css">
<link href="anotherstyle.css">
<link href="anotherstyle2.css">
<link itemprop="thumbnailUrl" href="http://image.jpg">
<link href="anotherstyle5.css">
<link href="anotherstyle7.css">

I want to get last href attribute.

My c# code:

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument htmldoc = web.Load(Url);
htmldoc.OptionFixNestedTags = true;
var navigator = (HtmlNodeNavigator)htmldoc.CreateNavigator();
string xpath = "//link/@href";
string val = navigator.SelectSingleNode(xpath).Value;

But that code return first href value.

Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
denied
  • 311
  • 1
  • 5
  • 18

5 Answers5

24

Following XPath selects link elements which have href attribute defined. Then from links you are selecting last one:

var link = doc.DocumentNode.SelectNodes("//link[@href]").LastOrDefault();
// you can also check if link is not null
var href = link.Attributes["href"].Value; // "anotherstyle7.css"

You can also use last() XPath operator

var link = doc.DocumentNode.SelectSingleNode("/link[@href][last()]");
var href = link.Attributes["href"].Value;

UPDATE: If you want to get last element which has both itemprop and href attributes, then use XPath //link[@href and @itemprop][last()] or //link[@href and @itemprop] if you'll go with first approach.

Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
5

load the webpage as Htmldocument and directly select the last link tag.

        HtmlWeb web = new HtmlWeb();
        HtmlDocument doc = web.Load(Url);
        var output = doc.DocumentNode.SelectNodes("//link[@href]").LastOrDefault();
        var data = output.Attributes["href"].Value;

or load the webpage as Htmldocument and get the collection of all selected link tags then travel using loop then access last select tag attribute.

        HtmlWeb web = new HtmlWeb();
        HtmlDocument doc = web.Load(Url);
        int count = 0;
        string data = "";
        var output = doc.DocumentNode.SelectNodes("//link[@href]");

        foreach (var item in output)
        {
            count++;
            if (count == output.Count)
            {
                data=item.Attributes["href"].Value;
                break;
            }
        }
SiwachGaurav
  • 1,918
  • 2
  • 17
  • 16
3

you need something like that:

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument htmldoc = web.Load(Url);
htmldoc.OptionFixNestedTags = true;
var navigator = (HtmlNodeNavigator)htmldoc.CreateNavigator();
string xpath = "//link[@itemprop]/@href";
string val = navigator.SelectSingleNode(xpath).Value;
elyashiv
  • 3,623
  • 2
  • 29
  • 52
1

Get a HtmlNode by attribute value:

public static class Extensions
{
    public static HtmlNode GetNodeByAttributeValue(this HtmlNode htmlNode, string attributeName, string attributeValue)
    {
        if (htmlNode.Attributes.Contains(attributeName))
        {
            if (string.Compare(htmlNode.Attributes[attributeName].Value, attributeValue, true) == 0)
            {
                return htmlNode;
            }
        }

        foreach (var childHtmlNode in htmlNode.ChildNodes)
        {
            var resultNode = GetNodeByAttributeValue(childHtmlNode, attributeName, attributeValue);
            if (resultNode != null) return resultNode;
        }

        return null;
    }
}

Usage

var searchResultsDiv = pageDocument.DocumentNode.GetNodeByAttributeValue("someattributename", "resultsofsearch");
Christian Findlay
  • 6,770
  • 5
  • 51
  • 103
0

Ok, I came to this:

var link = htmldoc.DocumentNode.SelectSingleNode("//link[@itemprop='thumbnailUrl']");
var href = link.Attributes["href"].Value;
denied
  • 311
  • 1
  • 5
  • 18
  • 1
    It's not good to completely change question in the middle - you asked for last `href` attribute, and then changed it to completely new question, which has nothing to getting last attribute – Sergey Berezovskiy Jan 20 '14 at 14:34