I have to deal with a site written with some kind of ASP.Net framework which sends a remarkable amount of POST data at each request (some 100 Kb of data, of which about 95 never seem to change between requests - viewport state related apparently).
However, no method I could find worked for me. I've looked into intercepting XHR, I've even found someone who is tackling the very same framework (at least judging from the selectors) but with a simpler case, inspired by this very question. I found out that back in the day this couldn't be done with PhantomJS.
My problem is that a click on a button starts a chain of AJAX requests culminating with the sending of this enormous POST form, to which finally the server replies with a "Content-Disposition: attachment".
In the end, I found this approach which works for me, even if it is network-inefficient:
...setting up everything, until I just need to click on a button...
phantomData = null;
phantomRequest = null;
// Here, I just recognize the form being submitted and copy it.
casper.on('resource.requested', function(requestData, request) {
for (var h in requestData.headers) {
if (requestData.headers[h].name === 'Content-Type') {
if (requestData.headers[h].value === 'application/x-www-form-urlencoded') {
phantomData = requestData;
phantomRequest = request;
}
}
}
});
// Here, I recognize when the request has FAILED because PhantomJS does
// not support straight downloading.
casper.on('resource.received', function(resource) {
for (var h in resource.headers) {
if (resource.headers[h].name === 'content-disposition') {
if (resource.stage === 'end') {
if (phantomData) {
// to do: get name from resource.headers[h].value
casper.download(
resource.url,
"output.pdf",
phantomData.method,
phantomData.postData
);
} else {
// Something went wrong.
}
// Possibly, remove listeners?
}
}
}
});
// Now, click on the button and initiate the dance.
casper.click(pdfLinkSelector);
The download works flawlessly, even if I can see that the file gets requested (and sent) twice.
[debug] [phantom] Navigation requested: url=https://somesite/SomePage.aspx, type=FormSubmitted, willNavigate=true, isMainFrame=true
[debug] [application] GOT FORM, REQUEST DATA SAVED
[warning] [phantom] Loading resource failed with status=fail (HTTP 200): https://somesite/SomePage.aspx
[debug] [application] END STAGE REACHED, PHANTOMDATA PRESENT
[debug] [application] ATTEMPTING CASPERJS.DOWNLOAD
[debug] [remote] sendAJAX(): Using HTTP method: 'POST'
[debug] [phantom] Downloaded and saved resource in output.pdf
[debug] [application] TERMINATING SUCCESSFULLY
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "about:blank"
(Next, I'll probably modify the script to try invoking request.abort()
from inside the resource.requested
listener, set a semaphore and invoke again the downloader - I won't be able to get the attachment filename, but that matters little to me).