1

I'm trying to do some screen scraping, and discovered the HTML AgilityPack, but am having some trouble figuring out how to use it with VB.net.

The first thing I want to do is find the URL string for an HREF tag if I know the text that is enclosed in the HREF.

The second thing is that I want to do is parse an HTML table, going through each row, and pulling out the data so I can save it to a database (after some basic analysis).

GregC
  • 7,737
  • 2
  • 53
  • 67
Avi
  • 962
  • 9
  • 17

1 Answers1

1

Here is a good starting link here on SO: How to use HTML Agility pack

See also this: HtmlAgilityPack example for changing links doesn't work. How do I accomplish this?

And this: Finding all the A HREF Urls in an HTML document (even in malformed HTML)

To find a specific HREF, the xpath syntax would be "//a[@href='your url']", meaning: "get any A tag that has an HREF attribute equal to 'your url'.

EDIT:

To find an HREF if you only know the text, for example if you have the html text '<a href="homepage.html">Cars</a>' and look for homepage.html, then this is how you would do it.

        string s = @"<a href=""homepage.html"">Cars</a>";

        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(s);

        HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[text()='Cars']");
        Console.WriteLine("href=" + node.GetAttributeValue("href", null));
Community
  • 1
  • 1
Simon Mourier
  • 132,049
  • 21
  • 248
  • 298