0

I'm reading the content of some web pages and I got problem with one of them which updates some values using jquery. Is there any way to read the content with a little waiting on that page?

I'm currently using HtmlAgilityPack to fetch web page contents.

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(myUrl);

var data = doc.DocumentNode.SelectSingleNode("SOME-SELECTOR")?.InnerText.ToString();

Tried to use BrowserDelay but it's not really working for me.

web.BrowserDelay = new TimeSpan(0, 0, 5);
Ali Torabi
  • 32
  • 1
  • 14
  • Are you needing to wait for line 2 to finish before running line 3? If so, could you use C# await to allow the web.Load() to finish before processing with line 3? – Dan Sorensen Feb 12 '18 at 06:38
  • or are you saying that the html content of myUrl is not in the desired state until 5 seconds after page load? – Dan Sorensen Feb 12 '18 at 06:41
  • Actually `web.Load(myUrl)` should take action then wait for 3 seconds and get the content and fill it in `doc`. So the await thing is no good for this situation. @DanSorensen – Ali Torabi Feb 12 '18 at 06:44

1 Answers1

1

The JavaScript in the retrieved web document is not being executed by HtmlWeb(). Waiting will not trigger the desired state. The JavaScript must be executed, either by your own mechanism, or by controlling a headless browser that will process JavaScript to retrieve your data.

See this related question: Screen Scraping Web Page After Delay for appropriate approaches.

Dan Sorensen
  • 11,403
  • 19
  • 67
  • 100