4

I'm pretty new to CasperJS, but isn't there a way to open a URL and execute CasperJS commands in for loops? For example, this code doesn't work as I expected it to:

casper.then(function() {
    var counter = 2013;
    for (i = counter; i < 2014; i++) {
        var file_name = "./Draws/wimbledon_draw_" + counter + ".json";
        // getting some local json files
        var json = require(file_name);
        var first_round = json["1"];
        for (var key in first_round) {
            var name = first_round[key].player_1.replace(/\s+/g, '-');
            var normal_url = "http://www.atpworldtour.com/Tennis/Players/" + name;
            // the casper command below only executes AFTER the for loop is done
            casper.thenOpen(normal_url, function() {
                this.echo(normal_url);
            });
        }
    }
});

Instead of Casper is calling thenOpen on each new URL per iteration, it gets only called AFTER the for loop executes. Casper thenOpen then gets called with the last value normal_url is set to. Is there no Casper command to have it work each iteration within the for loop?

Follow up: How do we make casper thenOpen return a value on the current iteration of the for loop?

Say for example, I needed a return value on that thenOpen (maybe if the HTTP status is 404 I need to evaluate another URL so I want to return false). Is this possible to do?

Editing casper.thenOpen call above:

    var status;
    // thenOpen() only executes after the console.log statement directly below
    casper.thenOpen(normal_url, function() {
        status = this.status(false)['currentHTTPStatus'];
        if (status == 200) {
            return true;
        } else {
            return false;
        }
    });
    console.log(status); // This prints UNDEFINED the same number of times as iterations.
Derrick Mar
  • 1,021
  • 1
  • 12
  • 15
  • Use an IIFE in your `for loop` to wrap your casper step. Otherwise i has the same scope/ref. – Fanch Jun 23 '14 at 08:19
  • @Fanch I think that `thenOpen` uses the correct/updated url since `normal_url` reference is changed in the loop, but you are right that the `this.echo(normal_url)` would only print the last url every time. – Artjom B. Jun 23 '14 at 08:49
  • @ArtjomB. Yeah as also mentioned below I understand what your saying, but it seems very counter-intuitive that this.echo(normal_url) would print only the last_url when normal_url is changing. Is there any sources that explain this odd behavior? – Derrick Mar Jun 24 '14 at 06:30
  • 1
    Your follow-up ought to be a different question. (Though, if I've understood casperJs internals correctly, the answer is almost the same: `casper.thenOpen` means "queue something to run later". It does not mean "run this now". I.e. your `console.log` statement runs first. – Darren Cook Jun 25 '14 at 07:12

2 Answers2

2

If you need to get context then use the example here: https://groups.google.com/forum/#!topic/casperjs/n_zXlxiPMtk

I used the IIFE (immediately-invoked-function-expression) option.

Eg:

for(var i in links) {
  var link = links[i];

  (function(index) {
    var link = links[index]
    var filename = link.replace(/#/, '');
    filename = filename.replace(/\//g, '-') + '.png';

    casper.echo('Attempting to capture: '+link);
    casper.thenOpen(vars.domain + link).waitForSelector('.title h1', function () {
      this.capture(filename);
    });
  })(i);
}

links could be an array of objects and therefore your index is a reference to a group of properties if need be...

var links = [{'page':'some-page.html', 'filename':'page-page.png'}, {...}]
DynamicDan
  • 425
  • 4
  • 12
1

As Fanch and Darren Cook stated, you could use an IIFE to fix the url value inside of the thenOpen step.

An alternative would be to use getCurrentUrl to check the url. So change the line

this.echo(normal_url);

to

this.echo(this.getCurrentUrl());

The problem is that normal_url references the last value that was set but not the current value because it is executed later. This does not happen with casper.thenOpen(normal_url, function(){...});, because the current reference is passed to the function. You just see the wrong url, but the correct url is actually opened.


Regarding your updated question:

All then* and wait* functions in the casperjs API are step functions. The function that you pass into them will be scheduled and executed later (triggered by casper.run()). You shouldn't use variables outside of steps. Just add further steps inside of the thenOpen call. They will be scheduled in the correct order. Also you cannot return anything from thenOpen.

var somethingDone = false;
var status;
casper.thenOpen(normal_url, function() {
    status = this.status(false)['currentHTTPStatus'];
    if (status != 200) {
        this.thenOpen(alternativeURL, function(){
            // do something
            somethingDone = true;
        });
    }
});
casper.then(function(){
    console.log("status: " + status);
    if (somethingDone) {
        // something has been done
        somethingDone = false;
    }
});

In this example this.thenOpen will be scheduled after casper.thenOpen and somethingDone will be true inside casper.then because it comes after it.


There are some things that you need to fix:

  • You don't use your counter i: you probably mean "./Draws/wimbledon_draw_" + i + ".json" not "./Draws/wimbledon_draw_" + counter + ".json"
  • You cannot require a JSON string. Interestingly, you can require a JSON file. I still would use fs.read to read the file and parse the JSON inside it (JSON.parse).

Regarding your question...

You didn't schedule any commands. Just add steps (then* or wait*) behind or inside of thenOpen.

Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
  • Currently we can require a JSON file. `var json = require('../../conf.json'); console.log(json.url);` and `tmp = fs.read('../../conf.json'); console.log(JSON.parse(tmp).url);` do the same for me. – Fanch Jun 23 '14 at 08:29
  • @Fanch Thanks, didn't know that! But this is kind of strange. Does node.js require work the same way? – Artjom B. Jun 23 '14 at 08:43
  • _"As of node v0.5.x yes you can require your JSON just as you would require a js file."_ – Fanch Jun 23 '14 at 08:47
  • Hey Artjon! Can you explain why when we echo() we see the wrong url, but the correct url is actually opened? If the correct referenced is passed then why don't we see the correct value on echo()? Also doesn't thenOpen already schedule commands? It's just a short cut for chaining casper.open(normal_url).then(function() { ... }); – Derrick Mar Jun 23 '14 at 23:10
  • Hi Derrek: Javascript has function level scope. The function inside `thenOpen` will be executed after the loop already executed, because `casper.run` triggers the execution. So `this.echo(normal_url);` doesn't find `normal_url` in this function and tries to look it up a scope higher in the function of `then`. Since the variable still exists it will point to the last loop iteration. The `normal_url` that you pass to `thenOpen` is actually a new reference for each loop iteration and this reference will be fixed because this is essentially a new function scope with a variable passed by reference. – Artjom B. Jun 24 '14 at 08:36
  • Very clear explanation. Sorry Artjom, but I've been still trying to get it to work with IIFE's. Could you update your post that provides a solution using IIFE? P.S. I'm not sure why casper is implemented like this. Seems very complicated. – Derrick Mar Jun 25 '14 at 03:20
  • 1
    Derrek: This is the asynchronous nature of javascript. I don't think there is anything that could have been done differently when implementing casperjs. – Artjom B. Jun 25 '14 at 05:57
  • @ArtjomB: Sorry for the late reply, I had some personal issues come up. Your reply help! But I was wondering if that will work for every iteration or just the last one. Because, in my update, the goal was to evaluate another URL for each url in the iteration that fails status -== 200. I think your's will just work for the last url right? – Derrick Mar Jul 10 '14 at 01:18
  • You can nest the `thenOpen` and the subsequent `then` steps inside of an [`each`](http://docs.casperjs.org/en/latest/modules/casper.html#each) call and therefore check the status on every iteration independently. See my updated answer. – Artjom B. Jul 10 '14 at 09:05