0

For a personal use .net project i'm trying to load and parse a skyscanner.net search results page. I know Skyscanner has APIs for these kind of tasks but it seems there isn't a free personal license to use.

The problem is Skyscanner needs a lot of seconds to complete a search process, so using HtmlAgilityPack to load the document results in a page without the content i'm looking for.

I tried to use WebBrowser object and its event DocumentCompleted, but it seems the event is triggered before the actual search results are loaded into the page.

So, is there any way to load the page, wait for the javascript asyncronous scripts completely fill the page and then get the html to parse?

Luca Clavarino
  • 404
  • 1
  • 4
  • 16
  • 1
    well, that's why asyncronous ajax calls were invented. Are you using jQuery or any other comfortable javascript library? – briosheje Jan 16 '15 at 09:50
  • No, i'm only working in a C# win form. – Luca Clavarino Jan 16 '15 at 10:04
  • Well, then this might help you: http://stackoverflow.com/questions/202481/how-to-use-httpwebrequest-net-asynchronously ;) – briosheje Jan 16 '15 at 10:12
  • Thanks for your suggestion but if i use HttpWebRequest i get a line in html response saying "Skyscanner needs JavaScript to work. It looks like your browser doesn't support JavaScript, or has it turned off.". http://stackoverflow.com/questions/12503040/c-sharp-basic-web-httpwebrequest-does-not-support-javascript – Luca Clavarino Jan 16 '15 at 10:57
  • Then you are forced to use a web browser or to simulate it, as far as I know.. Just a side question: why are you using a windows form application for such a thing? – briosheje Jan 16 '15 at 11:05
  • The reason is only to have a minimal interface with two datepicker controls and a datagrid to display the output i want to grab – Luca Clavarino Jan 16 '15 at 11:31

2 Answers2

1

You could use of the PhantomJs. I had this Issue, but don't found any solution for my problem. I use of the PhantomJs in This Article and get loaded page after 10 seconds. In my opinion, best solution for your issue is that Article.

Community
  • 1
  • 1
0

As you stated that this is a personal project then I would start out by looking into PhantomJS:

Its basically a copy of Chrome without a front end and is controllable via an API.

You probably can get that to pull in a copy of the site, run the JavaScript and then pull a copy of the final HTML over to Html-Agility-Pack.

As it stands, I think you will have problems doing this with Html-Agility-Pack on its own as it is just designed to parse blocks of static html.

rtpHarry
  • 13,019
  • 4
  • 43
  • 64