97

I'm using CasperJS to automate a series of clicks, completed forms, parsing data, etc through a website.

Casper seems to be organized into a list of preset steps in the form of then statements (see their example here: http://casperjs.org/quickstart.html) but it's unclear what triggers the next statement to actually run.

For example, does then wait for all pending requests to complete? Does injectJS count as a pending request? What happens if I have a then statement nested - chained to the end of an open statement?

casper.thenOpen('http://example.com/list', function(){
    casper.page.injectJs('/libs/jquery.js');
    casper.evaluate(function(){
        var id = jQuery("span:contains('"+itemName+"')").closest("tr").find("input:first").val();
        casper.open("http://example.com/show/"+id); //what if 'then' was added here?
    });
});

casper.then(function(){
    //parse the 'show' page
});

I'm looking for a technical explanation of how the flow works in CasperJS. My specific problem is that my last then statement (above) runs before my casper.open statement & I don't know why.

bendytree
  • 13,095
  • 11
  • 75
  • 91
  • 1
    I'm still looking for an explanation of the general `flow` of casperjs, but I've discovered that you basically cannot reference casper from within an `evaluate` call. (i.e. you cannot open a new url, log, echo, etc). So in my case evaluate was being called but with no way to interact with the outside world. – bendytree Jul 23 '12 at 00:29
  • 1
    I was wondering exactly the same things but too lazy to ask. Good question! – Nathan Aug 11 '12 at 07:43
  • 4
    `evaluate()` is for code that runs in the "browser", in the DOM of the page phantomjs is browsing. So there's no `casper.open` there, but there could be jQuery. So your example makes no sense, but I still wonder what `then()` actually does. – Nathan Aug 11 '12 at 07:47

3 Answers3

93

then() basically adds a new navigation step in a stack. A step is a javascript function which can do two different things:

  1. waiting for the previous step - if any - being executed
  2. waiting for a requested url and related page to load

Let's take a simple navigation scenario:

var casper = require('casper').create();

casper.start();

casper.then(function step1() {
    this.echo('this is step one');
});

casper.then(function step2() {
    this.echo('this is step two');
});

casper.thenOpen('http://google.com/', function step3() {
    this.echo('this is step 3 (google.com is loaded)');
});

You can print out all the created steps within the stack like this:

require('utils').dump(casper.steps.map(function(step) {
    return step.toString();
}));

That gives:

$ casperjs test-steps.js
[
    "function step1() { this.echo('this is step one'); }",
    "function step2() { this.echo('this is step two'); }",
    "function _step() { this.open(location, settings); }",
    "function step3() { this.echo('this is step 3 (google.com is loaded)'); }"
]

Notice the _step() function which has been added automatically by CasperJS to load the url for us; when the url is loaded, the next step available in the stack — which is step3() — is called.

When you have defined your navigation steps, run() executes them one by one sequentially:

casper.run();

Footnote: the callback/listener stuff is an implementation of the Promise pattern.

NiKo
  • 11,215
  • 6
  • 46
  • 56
  • In casperjs 1.0.0-RC1, "test-steps.js" is displaying a collection of [object DOMWindow], instead of a collection of function definition strings. – starlocke Nov 01 '12 at 15:20
  • The [object DOMWindow] collection is still the result in 1.0.0-RC4; I wonder where those function definitions went... – starlocke Nov 01 '12 at 15:27
  • 1
    I initially thought that CasperJS was doing a new trick of converting functions into DOMWindows, but the problem was really "return this.toString()" vs "return step.toString()" -- I submitted an edit for the answer. – starlocke Nov 01 '12 at 15:36
  • 5
    Isn't the so called 'stack' actually a queue? The steps are executed in order, had it been a stack wouldn't we expect step 3, step 2, step 1? – Reut Sharabani Oct 20 '13 at 16:09
  • 1
    I think it must be like this: You have a stack of steps. You pop off a step and evaluate it. You create an empty queue. Any steps generated due to the processing of the current step get put in this queue. When the step has finished evaluating, all the generated steps in the queue get put on top of the stack, but preserving their order within their queue. (The same as pushing onto the stack in reverse order). – Mark Jan 09 '15 at 01:44
  • Why use a stack at all? What is the difference between `open` and `thenOpen`? – Ciro Santilli OurBigBook.com Dec 07 '15 at 18:34
33

then() merely registers a series of steps.

run() and its family of runner functions, callbacks, and listeners, are all what actually do the work of executing each step.

Whenever a step is completed, CasperJS will check against 3 flags: pendingWait, loadInProgress, and navigationRequested. If any of those flags is true, then do nothing, go idle until a later time (setInterval style). If none of those flags is true, then the next step will get executed.

As of CasperJS 1.0.0-RC4, a flaw exists, where, under certain time-based circumstances, the "try to do next step" method will be triggered before CasperJS had the time to raise either one of the loadInProgress or navigationRequested flags. The solution is to raise one of those flags before leaving any step where those flags are expected to be raised (ex: raise a flag either before or after asking for a casper.click()), maybe like so:

(Note: This is only illustrative, more like psuedocode than proper CasperJS form...)

step_one = function(){
    casper.click(/* something */);
    do_whatever_you_want()
    casper.click(/* something else */); // Click something else, why not?
    more_magic_that_you_like()
    here_be_dragons()
    // Raise a flag before exiting this "step"
    profit()
}

To wrap up that solution into a single-line of code, I introduced blockStep() in this github pull request, extending click() and clickLabel() as a means to help guarantee that we get the expected behaviour when using then(). Check out the request for more info, usage patterns, and minimum test files.

starlocke
  • 3,407
  • 2
  • 25
  • 38
0

According to the CasperJS Documentation:

then()

Signature: then(Function then)

This method is the standard way to add a new navigation step to the stack, by providing a simple function:

casper.start('http://google.fr/');

casper.then(function() {
  this.echo('I\'m in your google.');
});

casper.then(function() {
  this.echo('Now, let me write something');
});

casper.then(function() {
  this.echo('Oh well.');
});

casper.run();

You can add as many steps as you need. Note that the current Casper instance automatically binds the this keyword for you within step functions.

To run all the steps you defined, call the run() method, and voila.

Note: You must start() the casper instance in order to use the then() method.

Warning: Step functions added to then() are processed in two different cases:

  1. when the previous step function has been executed,
  2. when the previous main HTTP request has been executed and the page loaded;

Note that there's no single definition of page loaded; is it when the DOMReady event has been triggered? Is it "all requests being finished"? Is it "all application logic being performed"? Or "all elements being rendered"? The answer always depends on the context. Hence why you're encouraged to always use the waitFor() family methods to keep explicit control on what you actually expect.

A common trick is to use waitForSelector():

casper.start('http://my.website.com/');

casper.waitForSelector('#plop', function() {
  this.echo('I\'m sure #plop is available in the DOM');
});

casper.run();

Behind the scenes, the source code for Casper.prototype.then is shown below:

/**
 * Schedules the next step in the navigation process.
 *
 * @param  function  step  A function to be called as a step
 * @return Casper
 */
Casper.prototype.then = function then(step) {
    "use strict";
    this.checkStarted();
    if (!utils.isFunction(step)) {
        throw new CasperError("You can only define a step as a function");
    }
    // check if casper is running
    if (this.checker === null) {
        // append step to the end of the queue
        step.level = 0;
        this.steps.push(step);
    } else {
        // insert substep a level deeper
        try {
            step.level = this.steps[this.step - 1].level + 1;
        } catch (e) {
            step.level = 0;
        }
        var insertIndex = this.step;
        while (this.steps[insertIndex] && step.level === this.steps[insertIndex].level) {
            insertIndex++;
        }
        this.steps.splice(insertIndex, 0, step);
    }
    this.emit('step.added', step);
    return this;
};

Explanation:

In other words, then() schedules the next step in the navigation process.

When then() is called, it is passed a function as a parameter which is to be called as a step.

It checks if an instance has started, and if it has not, it displays the following error:

CasperError: Casper is not started, can't execute `then()`.

Next, it checks if the page object is null.

If the condition is true, Casper creates a new page object.

After that, then() validates the step parameter to check if it is not a function.

If the parameter is not a function, it displays the following error:

CasperError: You can only define a step as a function

Then, the function checks if Casper is running.

If Casper is not running, then() appends the step to the end of the queue.

Otherwise, if Casper is running, it inserts a substep a level deeper than the previous step.

Finally, the then() function concludes by emitting a step.added event, and returns the Casper object.

Community
  • 1
  • 1
Grant Miller
  • 27,532
  • 16
  • 147
  • 165