160

I'm trying to use phantomJS (what an awesome tool btw!) to submit a form for a page that I have login credentials for, and then output the content of the destination page to stdout. I'm able to access the form and set its values successfully using phantom, but I'm not quite sure what the right syntax is to submit the form and output the content of the subsequent page. What I have so far is:

var page = new WebPage();
var url = phantom.args[0];

page.open(url, function (status) {

  if (status !== 'success') {
      console.log('Unable to access network');
  } else {

    console.log(page.evaluate(function () {

      var arr = document.getElementsByClassName("login-form");
      var i;

      for (i=0; i < arr.length; i++) {

        if (arr[i].getAttribute('method') == "POST") {
          arr[i].elements["email"].value="mylogin@somedomain.example";
          arr[i].elements["password"].value="mypassword";

          // This part doesn't seem to work. It returns the content
          // of the current page, not the content of the page after
          // the submit has been executed. Am I correctly instrumenting
          // the submit in Phantom?
          arr[i].submit();
          return document.querySelectorAll('html')[0].outerHTML;
        }

      }

      return "failed :-(";

    }));
  }

  phantom.exit();
}
Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
Vijay Boyapati
  • 7,632
  • 7
  • 31
  • 48

4 Answers4

232

I figured it out. Basically it's an async issue. You can't just submit and expect to render the subsequent page immediately. You have to wait until the onLoad event for the next page is triggered. My code is below:

var page = new WebPage(), testindex = 0, loadInProgress = false;

page.onConsoleMessage = function(msg) {
  console.log(msg);
};

page.onLoadStarted = function() {
  loadInProgress = true;
  console.log("load started");
};

page.onLoadFinished = function() {
  loadInProgress = false;
  console.log("load finished");
};

var steps = [
  function() {
    //Load Login Page
    page.open("https://website.example/theformpage/");
  },
  function() {
    //Enter Credentials
    page.evaluate(function() {

      var arr = document.getElementsByClassName("login-form");
      var i;

      for (i=0; i < arr.length; i++) {
        if (arr[i].getAttribute('method') == "POST") {

          arr[i].elements["email"].value="mylogin";
          arr[i].elements["password"].value="mypassword";
          return;
        }
      }
    });
  },
  function() {
    //Login
    page.evaluate(function() {
      var arr = document.getElementsByClassName("login-form");
      var i;

      for (i=0; i < arr.length; i++) {
        if (arr[i].getAttribute('method') == "POST") {
          arr[i].submit();
          return;
        }
      }

    });
  },
  function() {
    // Output content of page to stdout after form has been submitted
    page.evaluate(function() {
      console.log(document.querySelectorAll('html')[0].outerHTML);
    });
  }
];

interval = setInterval(function() {
  if (!loadInProgress && typeof steps[testindex] == "function") {
    console.log("step " + (testindex + 1));
    steps[testindex]();
    testindex++;
  }
  if (typeof steps[testindex] != "function") {
    console.log("test complete!");
    phantom.exit();
  }
}, 50);
Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
Vijay Boyapati
  • 7,632
  • 7
  • 31
  • 48
  • 3
    this is a great template. Here are a couple of things I added: inside `setInterval` use `var func = steps[testindex]`, then `console.log("step " + (testindex + 1) + ": " + funcName(func))`. This allows you to add description to the steps being performed. – Jonno May 19 '14 at 12:55
  • see [here](http://stackoverflow.com/a/15714445/620856) for `funcName`. Also I found it easier when going through a series of web pages, and trying different techniques, to render the last page using `page.render("output.png");`. – Jonno May 19 '14 at 13:01
  • 3
    This is really helpful post. One question though. When you submit form using POST, data is sent to server, and server returns response. Where is the code where you handle this response or it is automatically done by phantomjs? Also, after form submition, a server can return `COOKIE`, and my question is: _*is this cookie available in `phantom.cookies` object when server returns response*_? – MrD Jul 15 '15 at 14:39
  • use CasperJS its more better than PhantomJS, its has ability to post to forms without complex coding – jmp Mar 02 '16 at 12:11
  • Could you please check this too https://stackoverflow.com/questions/44624964/phantom-js-on-web-project – Manik Jun 19 '17 at 08:03
  • @Vijay Boyapati I know this is old and your solution works. In your solution your login and password is hard coded. What if i was reading of a csv list of variables i wanted to pass into the page.evaluate(function()) so it could look like something like this: document.getElementById('whatever).value = 1st variable; document.getElementById('whatever').value = 2nd variable; ? – Proximus Seraphim Dimitri Davi Feb 19 '20 at 17:31
62

Also, CasperJS provides a nice high-level interface for navigation in PhantomJS, including clicking on links and filling out forms.

CasperJS

Updated to add July 28, 2015 article comparing PhantomJS and CasperJS.

(Thanks to commenter Mr. M!)

arboc7
  • 5,762
  • 2
  • 27
  • 30
  • 1
    Casper did not work for me because you could only fill out a form input using name. I needed to use id. – user984003 Apr 01 '13 at 06:33
  • 4
    @user984003 You should be able to set your selector to `#someid` to fill in based on an ID. – arboc7 Apr 01 '13 at 22:12
  • 2
    CasperJS is a godsend! It makes scraping ASPX pages a breeze. Thank you! – Tobia May 28 '14 at 14:40
  • @user984003 I don't know if you were using an older version, but the current one has a fillSelectors() to fill form fields using any selector. – Tobia May 28 '14 at 14:41
  • 3
    Anyone who is using PhantomJS should start using CasperJS. Here is post describing why: http://code-epicenter.com/why-is-casperjs-better-than-phantomjs/ – MrD Jul 28 '15 at 12:28
19

Sending raw POST requests can be sometimes more convenient. Below you can see post.js original example from PhantomJS

// Example using HTTP POST operation

var page = require('webpage').create(),
    server = 'http://posttestserver.example/post.php?dump',
    data = 'universe=expanding&answer=42';

page.open(server, 'post', data, function (status) {
    if (status !== 'success') {
        console.log('Unable to post!');
    } else {
        console.log(page.content);
    }
    phantom.exit();
});
Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
Jakub M.
  • 32,471
  • 48
  • 110
  • 179
  • 6
    Be aware, readers, that performing `GET` requests similarly (by doing something like `page.open(server, 'get', data, ...`) won't work. – zbr Oct 06 '14 at 11:25
6

As it was mentioned above CasperJS is the best tool to fill and send forms. Simplest possible example of how to fill & submit form using fill() function:

casper.start("http://example.com/login", function() {
//searches and fills the form with id="loginForm"
  this.fill('form#loginForm', {
    'login':    'admin',
    'password':    '12345678'
   }, true);
  this.evaluate(function(){
    //trigger click event on submit button
    document.querySelector('input[type="submit"]').click();
  });
});
DominikStyp
  • 360
  • 6
  • 10