1

im trying to get all links(a[href] links) from a web page using HtmlAgilityPack;

my code:

HtmlWeb web = new HtmlWeb();
HtmlDocument site = web.Load("https://www.google.com/");
HtmlNodeCollection links = site.DocumentNode.SelectNodes("//a[@href]");
foreach (HtmlNode link in links)
{
    Console.WriteLine(link.GetAttributeValue("href", "DefaultValue"));   
}

problem: i noticed that my code doesn't getting "all" links from the page and missed some of links...

My result using Jsoup Java

My result using HtmlAgilityPack C#

i did this with JSoup in java and it worked fine.(16 link on google main page) but with HtmlAgilityPack im getting 13 links at the same page... or maybe the problem is on something else... (there is a problem with relative links too but il fix that later)

  • Please share a [mcve] with sample HTML included please. – mjwills May 03 '21 at 08:15
  • You likely need to call https://stackoverflow.com/a/6696727/34092 on the href since the href will be encoded. – mjwills May 03 '21 at 08:16
  • There are 54 Anchors in the rendered page. The non-rendered page (before all JavaScripts are run), contains 13 Anchors. Since the `Load()` method doesn't render the content (doesn't run scripts, it just uses HttpClient or HttpWebRequest), 13 is the correct number of Anchors. (using the `https://www.google.com/` Uri with language set to `en-US`: different languages may generate a different number of Anchors) -- If you instead use `LoadFromWeb()`, you have quite a different result. – Jimi May 03 '21 at 09:37

0 Answers0