0

Here is the HTML source I'm trying to parse:

<a style='white-space: nowrap;' href='/AuthorStories-4931/dreamfall.htm'><img class='donoricon' alt='(Current Donor)'  title='(Current Donor)' src='http://static.tthf.me/images/donors/Current%20Donor.gif'/>dreamfall</a>

Here is the code I'm using:

authorLink = doc.DocumentNode.SelectSingleNode("//a[contains(@href, 'AuthorStories')]").OuterHtml;

This grabs the link correctly, but it also captures the img as well. The only part I want to grab is the href segment. Any suggestions on how to parse out just that particular section?

Ben
  • 659
  • 8
  • 24

1 Answers1

1

[Haven't touched HtmlAgilityPack in a few years, but this should be generally true]

Instead of taking OuterHtml, there should be an Attributes array on the node returned by SelectSingleNode, you should be able to get href from there.

Rym
  • 650
  • 4
  • 16
  • That put me on the right track, thank you. This works perfectly: authorLink = doc.DocumentNode.SelectSingleNode("//a[contains(@href, 'AuthorStories')]").Attributes["href"].Value; authorLink = authorLink.Insert(0, "" + authorName + ""; Unfortunately it's a bit unwieldy for such a simple task. Is this how you'd recommend doing these steps, or would you do this in some other manner? – Ben Oct 20 '12 at 12:14
  • Are you trying to edit the document nodes, or you just want to form some html for another document? – Rym Oct 22 '12 at 17:17
  • If you plan to create a new string, I'd go with something like: var href = "tthfanfic.org" + doc.....Attributes["href"].Value; var newLink = string.Format("{1}", href, authorName); [haven't touched C# in a few months, so that could be iffy :P] – Rym Oct 22 '12 at 17:23