17

I want to download a CSV file, it is generated on a button click through a POST request. I researched to my best on casperJs and phantomJS forums and returned empty handed. In a normal browser like firefox, a browser download dialog window appears after the post request. How to handle this case in PhantomJS

TTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/7.5
Content-disposition: attachment;filename=ExportData.csv
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Fri, 19 Apr 2013 23:26:40 GMT
Content-Length: 65183
vumaasha
  • 2,765
  • 4
  • 27
  • 41

4 Answers4

8

I've found a way to do this using casperjs (it should work with phantomjs alone if you implement the download function using XMLHttpRequest, but i've not tried).

I'll leave you the working example, that tries to download the mos recent PDF from this page. When you click the download link, some javascript code is triggered that generates some hidden input fields that are then POSTed.

What we do is replace the form's onsubmit function so that it cancels the submission, and get the form destination (action) and all its fields. We use this information later to do the actual download.

var casper=require('casper').create();
casper.start("https://sede.gobcan.es/tributos/jsf/publico/notificaciones/comparecencia/ultimosanuncios.jsp", function() {

    var theFormRequest = this.page.evaluate(function() {
        var request = {}; 
        var formDom = document.forms["resultadoUltimasNotif"];
        formDom.onsubmit = function() {
            //iterate the form fields
            var data = {};
            for(var i = 0; i < formDom.elements.length; i++) {
               data[formDom.elements[i].name] = formDom.elements[i].value;
            }
            request.action = formDom.action;
            request.data = data;
            return false; //Stop form submission
        }

        //Trigger the click on the link.
        var link = $("table.listado tbody tr:first a");
        link.click();

        return request; //Return the form data to casper
    });

    //Start the download
    casper.download(theFormRequest.action, "downloaded_file.pdf", "POST", theFormRequest.data);
});

casper.run(); 

Note: you have to run it with --ignore-ssl-errors, as the CA they use isn't in your browser default CA list.

casperjs --ignore-ssl-errors=true downloadscript.js
Dusty J
  • 721
  • 6
  • 12
julianjm
  • 677
  • 6
  • 12
  • 1
    thanks, I finally managed to get the CSV download for my bank to work with your approach – vinzenzweber Jan 04 '15 at 14:16
  • This worked for me but in my instance clicking the link didn't submit the form because it was handled by javascript. I had to call $("#theForm").sumit() to cause the form to actually submit. – Michael J. Lee Jul 30 '16 at 11:12
3

You can listen to the page.resource.received event and download() the file when received:

casper.on('page.resource.received', function(resource) {
    if (resource.stage !== "end") {
        return;
    }
    if (resource.url.indexOf('ExportData.csv') > -1) {
        this.download(resource.url, 'ExportData.csv');
    }
});
NiKo
  • 11,215
  • 6
  • 46
  • 56
  • 1
    Also not sure how this is supposed to work unless the unmentioned step 0 is to compile a development branch of phantom instead of the current one? – dtanders Mar 17 '14 at 19:36
  • Nope, should work with default stable phantomjs as the `download()` method is implemented casper side (it uses XHR behind the scene). – NiKo Mar 25 '14 at 10:10
  • 1
    Seems that `resource.stage === 'end'` should be added (&&) into the `if` for proper operation. – Stan Sep 08 '14 at 19:55
  • Is there anyway of having a wait for this function of around 4 sec ? – Sentient07 May 21 '15 at 19:54
  • Hi @NiKo, does this work with javascript based clicks? Meaning, that the file is created after a click of a link via javascript? If not, any example on how this can be achieved? Thanks – user1749672 Nov 23 '15 at 12:44
  • 1
    This will only work for requests that are sent by HTTP GET method. If the click is triggered using POST - as in the OP's question - you need to do this.download(resource.url, 'ExportData.csv', "POST"). But this alone is not enough as you need the POST data to be sent as the 4th argument to download(...) – nchaud May 11 '16 at 11:55
2

@julianjm aproach is almost the solution, but in my case i did not have the correct form name to replace the form submission.

So i found another solution using phantomjs beta:

There is a beta version of phantomjs 2.0 that includes an event handler that solves this issue.

It is still a beta version, so there is no debugging.

So i have developed the clicks and the page treatments on the release version and then changed the phantom version to make download work.

 casper.start('http://www.website.com.br/', function() {
    this.page.onFileDownload = function(status){console.log('onFileDownload(' + status + ')'); 

//SYSTEM WILL DETECT THE DOWNLOAD, BUT YOU WILL HAVE TO NAME THE FILE BY YOURSLEF!!
return "ContactList_08-25-14.csv"; };

    });
      casper.then(function() {
        //DO YOUR STUFF HERE TO CLICK ON THE DOWNLOAD LINK. 
      });
    casper.run();

Download: Phantom 2.0 BETA

Download the exe, rename the release version of phantom.exe to phantom.bkp.exe and insert this 2.0 version on the place. Then, in casperjs you will need to add some lines at the beggining of casperjs/bin/bootstrap.js

 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 * DEALINGS IN THE SOFTWARE.
 *
 */
var system = require('system');
    var argsdeprecated = system.args;
    argsdeprecated.shift();
    phantom.args = argsdeprecated;

also comment the version check (same file):

(function(version) {
        // required version check
      /*  if (version.major !== 1) {
            return __die('CasperJS needs PhantomJS v1.x');
        } if (version.minor < 8) {
            return __die('CasperJS needs at least PhantomJS v1.8 or later.');
        }
        if (version.minor === 8 && version.patch < 1) {
            return __die('CasperJS needs at least PhantomJS v1.8.1 or later.');
        } */
    })(phantom.version);

Remember, this is a tweak!!.

So this lines on bootstrap will cause problems if you want to run phantom release version or slimerjs.

So DEVELOP ON RELEASE VERSION, than tweak to this version to be able to download. If you need to debug, you will have to remove the lines of bootstrap.js

LeoPucciBr
  • 151
  • 1
  • 10
0

I have to deal with a site written with some kind of ASP.Net framework which sends a remarkable amount of POST data at each request (some 100 Kb of data, of which about 95 never seem to change between requests - viewport state related apparently).

However, no method I could find worked for me. I've looked into intercepting XHR, I've even found someone who is tackling the very same framework (at least judging from the selectors) but with a simpler case, inspired by this very question. I found out that back in the day this couldn't be done with PhantomJS.

My problem is that a click on a button starts a chain of AJAX requests culminating with the sending of this enormous POST form, to which finally the server replies with a "Content-Disposition: attachment".

In the end, I found this approach which works for me, even if it is network-inefficient:

...setting up everything, until I just need to click on a button...

phantomData    = null;
phantomRequest = null;

// Here, I just recognize the form being submitted and copy it.

casper.on('resource.requested', function(requestData, request) {
    for (var h in requestData.headers) {
        if (requestData.headers[h].name === 'Content-Type') {
            if (requestData.headers[h].value === 'application/x-www-form-urlencoded') {
                phantomData         = requestData;
                phantomRequest      = request;
            }
        }
    }
});

// Here, I recognize when the request has FAILED because PhantomJS does
// not support straight downloading.

casper.on('resource.received', function(resource) {
    for (var h in resource.headers) {
        if (resource.headers[h].name === 'content-disposition') {
            if (resource.stage === 'end') {
                if (phantomData) {
                    // to do: get name from resource.headers[h].value
                    casper.download(
                        resource.url,
                        "output.pdf",
                        phantomData.method,
                        phantomData.postData
                    );
                } else {
                    // Something went wrong.
                }
                // Possibly, remove listeners?
            }
        }
    }
});

// Now, click on the button and initiate the dance.
casper.click(pdfLinkSelector);

The download works flawlessly, even if I can see that the file gets requested (and sent) twice.

[debug] [phantom] Navigation requested: url=https://somesite/SomePage.aspx, type=FormSubmitted, willNavigate=true, isMainFrame=true
[debug] [application] GOT FORM, REQUEST DATA SAVED
[warning] [phantom] Loading resource failed with status=fail (HTTP 200): https://somesite/SomePage.aspx
[debug] [application] END STAGE REACHED, PHANTOMDATA PRESENT
[debug] [application] ATTEMPTING CASPERJS.DOWNLOAD
[debug] [remote] sendAJAX(): Using HTTP method: 'POST'
[debug] [phantom] Downloaded and saved resource in output.pdf
[debug] [application] TERMINATING SUCCESSFULLY
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "about:blank"

(Next, I'll probably modify the script to try invoking request.abort() from inside the resource.requested listener, set a semaphore and invoke again the downloader - I won't be able to get the attachment filename, but that matters little to me).

LSerni
  • 55,617
  • 10
  • 65
  • 107