1

Trying to scrape this Website but unable to do it..

It throws an exception with the message has Error downloading Html

enter image description here

C# Code

    async public static Task<HtmlDocument> GetDocument()
    {
        HtmlDocument doc = null;
        string url = "https://www.finedininglovers.com/recipes/appetizer/vegan-dishes-white-asparagus/";
        try
        {
            HtmlWeb web = new HtmlWeb();
            doc = await web.LoadFromWebAsync(url);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message);
            Console.WriteLine(ex.StackTrace);
        }
        return doc;
    }

Tried setting Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 as the UserAgent but still not working

Sharath
  • 2,348
  • 8
  • 45
  • 81
  • @Daniel its not a null issue and the link which you have shared is related to null exception but in my case thats not the issue since I have specifically said for 1 particular website it isn't working – Sharath Apr 11 '18 at 18:27
  • i see that error in your console. – Daniel A. White Apr 11 '18 at 18:28
  • its an exception being thrown when `LoadFromWebAsync` is called. Same code gets the result for other website links but the link which I have posted doesn't work – Sharath Apr 11 '18 at 18:32
  • @Sharath Don't waste your time with HtmlAgilityPack, it's old and broken - use [AngleSharp](https://www.nuget.org/packages/anglesharp/) instead. As a bonus, AngleSharp has no problem scraping that page. :) – Ian Kemp Apr 12 '18 at 04:51
  • @DanielA.White The NullRef is being thrown by HtmlAgilityPack, not the code posted. So the issue is a bug in HAP. – Ian Kemp Apr 12 '18 at 04:52
  • 1
    @LanKemp 70% of code already written using HAP and need to check it if its a bug from HAP – Sharath Apr 12 '18 at 12:15
  • @Sharath Since it's a bug in HAP and you've already [submitted an issue to them](https://github.com/zzzprojects/html-agility-pack/issues/171), maybe add that as an answer? – Ian Kemp Apr 13 '18 at 13:13

1 Answers1

1

An issue is created here Link

Below code works as mentioned in the github link.

HtmlAgilityPack.HtmlDocument doc = null;
string url = "your_link";

HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
doc = web.Load(url);
var html = doc.DocumentNode.OuterHtml;
Sharath
  • 2,348
  • 8
  • 45
  • 81