1

I'm having troubles trying to scrap the price of this webpage: http://www.voyages-bateau.com

It looks easy but any of the scraping services/tools I try seems to work with this page. Its content is loaded via ajax and the price appear later with an animation. I try the wait() and waitFor() helpers with no luck...

Here's the code I used to fetch this bad boy:

var casper = require('casper').create({
    verbose: true,
    logLevel: "debug"
});

casper.start('http://voyages-bateau.com', function() {
    console.log(this.getHTML()); // no content loaded yet
});

casper.waitForSelector('//*[@id="WRchTxt0-3cb"]/h2[3]/span', function() {
    var res = this.getHTML();
    this.echo(res);
});

casper.run();

All I got is the error: "Wait timeout of 5000ms expired, exiting.". Any ideas ?

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
Namlook
  • 181
  • 11
  • Thanks for taking time on this. This is indeed weird. I've been digging a little more about the provider of this page: it is generated by wix.com (a html5 wysiwyg online services). – Namlook Dec 14 '14 at 07:31

1 Answers1

1

The main issue is that PhantomJS 1.x has no support for Function.prototype.bind. The workaround can be found here: CasperJS bind issue. Because of this none of the JavaScript runs, since there is a page error and you see nothing, because it is a JS driven page.

You can verify this by registering to the page.error event:

casper.on("page.error", function(pageErr){
    this.echo("page.err: " + JSON.stringify(pageErr));
});

This yields this

page.err: "TypeError: 'undefined' is not a function (evaluating 'b.bind(a)')"
page.err: "TypeError: 'undefined' is not a function (evaluating 'c.bind(null,\"margin\")')"
page.err: "TypeError: 'undefined' is not a function (evaluating 'RegExp.prototype.test.bind(/^(data|aria)-[a-z_][a-z\\d_.\\-]*$/)')"

Which otherwise doesn't pop up with debug output or verbose enabled.

The other issue is that you forgot to use the XPath utility for XPaths:

var x = require('casper').selectXPath;

somewhere at the top and later:

casper.waitForSelector(x('//*[@id="WRchTxt0-3cb"]/h2[3]/span'), function() {
    var res = this.getHTML();
    this.echo(res);
});

Without the XPath utility, it tries to interpret this as a CSS selector. Since you have verbose: true, you have to have seen

[error] [remote] findAll(): invalid selector provided "//*[@id="WRchTxt0-3cb"]/h2[3]/span":Error: SYNTAX_ERR: DOM Exception 12

Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
  • I originally wanted to leave it with the duplicate, but there was also the issue of the XPath selector, which is why I added an answer. – Artjom B. Dec 14 '14 at 10:18