-1

I am trying to scrape https://public.rts.iebc.or.ke/enr/index.html#/Kenya_Elections_Senator/1

with HtmlAgilityPack. It is a dynamic site. The content is shown after the page loads completely.My code returns the HTML of the loading bar through this method while this method throws TargetInvocationException. I don't know how to make it wait until the page loads completely and then scrape it.

Nisarg Shah
  • 14,151
  • 6
  • 34
  • 55
Mean Coder
  • 304
  • 1
  • 12
  • 1
    Oh how I love this kind of websites, staring at a blank page for minutes. I thought people would have learnt the SEO lesson from Flash... As for the question, *WebBrowser* could work, but it's a mess. – IS4 Sep 03 '17 at 13:36
  • I have tried `WebBrowser` too but it does not work. – Mean Coder Sep 03 '17 at 13:40
  • @MeanCoder, what exactly do you want to scrape from that page? In order to wait for the page to load check this https://stackoverflow.com/questions/2777878/detect-webbrowser-complete-page-loading However after loading you'd also want to get event triggered dynamically generated Html. – derloopkat Sep 05 '17 at 14:41

1 Answers1

1

HtmlAgilityPack is just a library for .Net. You make a request and the library allows you to easily parse HTML response. If it does not contain the data you want to scrape then you need to do a different request. In the case of the page you mention, it uses Ajax for updating the page but Html is generated dynamically from a Json response. HtmlAgilityPack doesn't parse json but Html and this is a problem. If your code repeatedly makes requests for the same Url, you're going to get a new page every time with the original Html, which doesn't solve your problem neither.

If you're using WebBrowser you can wait using a timer.

With Selenium driver for .Net you need to set the timeout so that it will keep trying to find an element for a while before raising not found exception.

derloopkat
  • 6,232
  • 16
  • 38
  • 45
  • I have used WebBroswer but did not find any such option. Can you give me a link to selenium tutorial? – Mean Coder Sep 04 '17 at 06:35
  • How can i make the WebBroswer wait until a tag appears? – Mean Coder Sep 05 '17 at 02:10
  • @MeanCoder, as mentioned, you can add a Timer and then on `Tick` event try to get element and disable Timer. Another option is calling `DoEvents()` in a loop (*System.Windows.Forms.Application.DoEvents()*). It might be easier but using Timer is proper way to achieve your goal without disrupting main thread. – derloopkat Sep 05 '17 at 11:15