5

My browser just keeps loading when navigatetopage using scrapysharp and won't go to the next line of code. Below is my code using c# asp.net web form. May I know why? The link I use is working and can manually browse. The code just gets stuck at the Browser.NavigateToPage(new Uri("http://www.asnb.com.my/v3_/asnbv2_0index.php")); and keep loading in the browser. And I am using asp.net webform.

ScrapingBrowser Browser = new ScrapingBrowser();
Browser.AllowAutoRedirect = true; 
Browser.AllowMetaRedirect = true;

WebPage PageResult = Browser.NavigateToPage(new Uri("http://www.asnb.com.my/v3_/asnbv2_0index.php"));
HtmlNode TitleNode = PageResult.Html.CssSelect(".navbar-brand").First();
nvoigt
  • 75,013
  • 26
  • 93
  • 142
Tiong Gor
  • 73
  • 1
  • 8
  • 1
    Its so weird... I have older projects where I use ScrapySharp and they are still running fine. No hang/stuck at NavigateToPage. But I try ScrapySharp in a new MVC project (with same targeted framework and version) and it get stuck at NavigateToPage... Anyway HtmlAgilityPack works fine in new projects - http://www.c-sharpcorner.com/article/web-scraping-in-c-sharp/ – Nick G. Feb 07 '18 at 17:14
  • @NickG. I also had previously running code hanging when moving it from a console project to a WPF project. Fixed with [DaBlue solution](https://stackoverflow.com/a/62157018/774575) – mins Sep 07 '20 at 10:07

3 Answers3

2

I was having the same problem and decided not to use Browser.NavigateToPage and instead get the PageResult.Htmlusing an HtmlDocument.

For example:

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://www.asnb.com.my/v3_/asnbv2_0index.php");
HtmlNode TitleNode = doc.DocumentNode.CssSelect(".navbar-brand").First();

This should get you your expected results.

Keisha W
  • 686
  • 6
  • 17
1

Move your call to a backgroundworker thread. Notice that in line 353 in ScrapingBrowser.cs (ScrapySharp/ScrapySharp/Network/ScrapingBrowser.cs), the call to NavigateToPage() calls the Async version:

public WebPage NavigateToPage(Uri url, HttpVerb verb = HttpVerb.Get, string data = "", string contentType = null)
{
  return NavigateToPageAsync(url, verb, data, contentType).Result;
}

I had the same problem, as soon as I moved the call to my DoWork method in my BGW thread, it starts behaving the way you expect.

deweycooter
  • 82
  • 1
  • 2
  • 9
1

Another method would be to use the async version of the NavigateToPage eg:

private async Task<WebPage> LoadPage(Uri uri)
{
    WebPage page = await browser.NavigateToPageAsync(uri);
    return page;
}
Zonus
  • 2,313
  • 2
  • 26
  • 48
  • I moved a piece of working code from a Console application to a WPF application, and the call using `NavigateToPage` stopped working. This solution fixed it. – mins Sep 07 '20 at 10:02