1

I'm crawling multiple pages using CasperJS, but I got stuck.

The maximum number of pages are 200 but I want to stop the execution if the XPath below doesn't exist (for a page before the 200th).

How can I set up i variable?

var casper = require('casper').create();
var x = require('casper').selectXPath;

for (var i=1; i <=200; i++) {
    casper.wait(6000, function() {

        casper.thenClick(x('//*[@id="mArticle"]/div[2]/a['+i+']'), function (){
            console.log('Searching dic');
            words = words.concat(this.evaluate(getWords));
        });

    });
}
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
Hyung Kyu Park
  • 255
  • 2
  • 3
  • 12

1 Answers1

2

CasperJS provides the exists() function. So, you can rewrite your code like this:

for (var i=1; i <=200; i++) {
    (function(i){
        casper.wait(6000, function() {
            var button = x('//*[@id="mArticle"]/div[2]/a['+i+']');
            if (!this.exists(button)) {
                this.echo(i + " not available");
                return; // the following `thenClick()` is not executed
            }
            this.thenClick(button, function (){
                console.log('Searching dic');
                words = words.concat(this.evaluate(getWords));
            });
        });
    })(i);
}

I've also added an IIFE, so that you have the correct i inside of the callback. For more information, see JavaScript closure inside loops – simple practical example.

This works, but it is not very efficient if one would assume that if link 100 is not there, then link 101 and 102 etc. are also not there. You would wait a lot (6 seconds times 100). In that case you need to do this recursively, because of the asynchronous nature of CasperJS:

function execOnce(casper, i, max){
    // end condition
    if (i === max) {
        return;
    }
    casper.wait(6000, function() {
        var button = x('//*[@id="mArticle"]/div[2]/a['+i+']');
        if (!this.exists(button)) {
            this.echo(i + " not available");
            return;
        }
        this.thenClick(button, function (){
            console.log('Searching dic');
            words = words.concat(this.evaluate(getWords));

            // recursive step
            execOnce(this, i+1, max);
        });
    });
};

casper.start(url);

// start the recursive chain
casper.then(function(){
    execOnce(this, 1, 200);
});

casper.run();

Note that now that you have it recursively, you can define a proper end condition by explicitly looking on the page what's there and what isn't.

Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
  • Thanks for your answer! I got this and I noticed that you miss "else" under the if state ment but I really apprecitated – Hyung Kyu Park Sep 06 '15 at 13:14
  • The missing `else` was deliberate, because I want to reduce nesting of JavaScript callbacks, but I indeed forgot the `return` statement. – Artjom B. Sep 06 '15 at 13:19
  • I really appreciate if you read my Question2...(Edited Question) – Hyung Kyu Park Sep 06 '15 at 16:25
  • The answer is already big as it is and I don't understand what you're asking there. Please don't edit your questions to ask followup questions when there are already answers. You can post a new question with a complete description of your issue and what you want to accomplish. Then you can include a link to the previous question. – Artjom B. Sep 06 '15 at 16:38
  • Oh sorry, I'll remember your advice. I'm new at this service. I'll make a new question :) – Hyung Kyu Park Sep 06 '15 at 16:48
  • I wrote new question! http://stackoverflow.com/questions/32426015/is-it-possible-to-make-for-loop-in-casperjs – Hyung Kyu Park Sep 06 '15 at 17:01