Grabbing just the URL of an href using HTMLAgilityPack

Question

Here is the HTML source I'm trying to parse:

<a style='white-space: nowrap;' href='/AuthorStories-4931/dreamfall.htm'><img class='donoricon' alt='(Current Donor)'  title='(Current Donor)' src='http://static.tthf.me/images/donors/Current%20Donor.gif'/>dreamfall</a>

Here is the code I'm using:

authorLink = doc.DocumentNode.SelectSingleNode("//a[contains(@href, 'AuthorStories')]").OuterHtml;

This grabs the link correctly, but it also captures the img as well. The only part I want to grab is the href segment. Any suggestions on how to parse out just that particular section?

score 1 · Accepted Answer · answered Oct 20 '12 at 09:49

1

[Haven't touched HtmlAgilityPack in a few years, but this should be generally true]

Instead of taking OuterHtml, there should be an Attributes array on the node returned by SelectSingleNode, you should be able to get href from there.

answered Oct 20 '12 at 09:49

Rym

650
4
16

That put me on the right track, thank you. This works perfectly: authorLink = doc.DocumentNode.SelectSingleNode("//a[contains(@href, 'AuthorStories')]").Attributes["href"].Value; authorLink = authorLink.Insert(0, "" + authorName + ""; Unfortunately it's a bit unwieldy for such a simple task. Is this how you'd recommend doing these steps, or would you do this in some other manner? – Ben Oct 20 '12 at 12:14
Are you trying to edit the document nodes, or you just want to form some html for another document? – Rym Oct 22 '12 at 17:17
If you plan to create a new string, I'd go with something like: var href = "tthfanfic.org" + doc.....Attributes["href"].Value; var newLink = string.Format("{1}", href, authorName); [haven't touched C# in a few months, so that could be iffy :P] – Rym Oct 22 '12 at 17:23

Grabbing just the URL of an href using HTMLAgilityPack

1 Answers1

Linked