1

I'm trying to load a single page application that uses a heavy amount of async code execution involving RequireJS and jQuery deferreds. The application loads as expected inside the browser, but not in PhantomJS.

For instance, I spent some time trying to figure out how to make the following snippet work:

# index.html
<body>
  <script>
    require.config({
      base: '.',
      paths: {
        main: 'main'
      }
    })

    require(['main'], function() {
       window.myglobal = {
           something: 'foo'
       }
    });
  </script>
</body>

# phantomjs
page.evaluateAsync(function() {
     console.log(window.myglobal.something); // Should print out 'foo'.
}, 100);

I consider that using evaluateAsync with a fixed timeout that has to be determined by trial and error is not really satisfactory. Perhaps someone can suggest a better pattern.

Louis
  • 146,715
  • 28
  • 274
  • 320
crishushu
  • 435
  • 2
  • 4
  • 14

2 Answers2

1

The documentation for evaluateAsync does not say much so I'm going to take it at face value and assume that it just executes the code asynchronously, without any further constraint regarding what may or may not have loaded already. (The source code does not indicate any further constraints either.)

The problem I'm seeing is that you have two asynchronous functions that may execute in any order. When require(['main'], ...) is called, this tells RequireJS to start loading the module but you don't know when the module will actually load. Similarly, when you execute page.evaluateAsync you are telling PhantomJS to execute a piece of code asynchronously. So it will execute but you don't know when.

So the following can happen:

  1. The module finishes loading: window.myglobal is set.

  2. console.log is called, which outputs the correct value.

Or:

  1. console.log is called, which fails.

  2. The module finishes loading: window.myglobal is set.

Setting a timeout that delays the execution of console.log will make it more likely that the first case happens but it does not guarantee it.

What you could do is change your HTML like this:

<body>
  <script>
    require.config({
      base: '.',
      paths: {
        main: 'main'
      }
    })

    define('init', ['main'], function () {
       window.myglobal = {
           something: 'foo'
       };
    });

    require(['init']);
  </script>
</body>

Your PhantomJS script:

page.evaluateAsync(function() {
     require(['init'], function () {
         console.log(window.myglobal.something); // Should print out 'foo'.
     });
});

What this does is define a module called init. (This is a rare case where explicitly naming your module with define is okay. Usually you just start the define call with the list of dependencies.) Then when evaluateAsync is called it asks for the init module, which guarantees that the assignment to window.myglobal will have happened before console.log runs.

It would also be possible to use promises to get the desired results but I've preferred to show a solution that uses only RequireJS.

Louis
  • 146,715
  • 28
  • 274
  • 320
  • Thanks for your answer! I am familiar with requirejs but new to phantomjs. I guess your solution is not applicable to the setting, if phantomjs is intended to open the url of the app and has no local access to the js files. – crishushu Dec 15 '14 at 20:45
  • The function you pass to `evaluateAsync` executes *in the context of the browser*. There's no need for "local access to the js files." – Louis Dec 15 '14 at 20:50
1

PhantomJS is a headless browser that is used for all kinds of stuff. A big part of it is the testing/automation of websites. It means that you generally don't have the opportunity of changing the site code. Most of the time it is not necessary, such as in this case.

You simply need to wait until the page script/DOM is at a state that you want for further processing. This is usually done using waitFor from the examples of PhantomJS.

In your case, you can add the waitFor definition to the beginning of the script and wait for window.myglobal to be defined:

page.open(url, function(){
    waitFor(function check(){
        return page.evaluate(function(){
            return !!window.myglobal;
        });
    }, function then(){
        // do something useful
    }, 10000); // upper bound on acceptable wait timeout
});

check is a function which is called periodically to check that a certain condition is met. So the logic is that as soon as the condition is met, you can do something useful including doing something on the page using page.evaluate from the then callback.

There are also ways not to wait for specific variables/DOM nodes, but waiting for general ending of network activity as in this answer.

Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
  • The first paragraph is there to distinguish this answer from Louis' answer. – Artjom B. Dec 15 '14 at 20:42
  • I assume ```// do something useful```should be also inside the callback of ```page.evaluate``` – crishushu Dec 15 '14 at 21:36
  • Not necessarily. `check` is a function which is called periodically to check that a certain condition is met. So the logic is that as soon as the condition is met, you can do something useful including doing something on the page using `page.evaluate`. You can of course do something useful in the `check` function, but this purely depends on your use case. – Artjom B. Dec 15 '14 at 23:11
  • Yes! I was just adding the remark that the global variable is only defined inside the scope of ```page.evaluate``` – crishushu Dec 15 '14 at 23:18