3

I am trying to record constantly updating data on a webpage. In the Google Chrome developer tools, I can see that my incoming data is obtained by an AJAX request.

When I click on the 'got' text file, I can see the data that I want in Google Chrome. I would like to use PhantomJS to receive the AJAX responses and then save these responses to files.

So far I have a program that opens the URL of the webpage I'm interested in and can print out an overview of the network traffic that is being received, but I do not know how I can save the actual files as they come in. How would I do this?

Code so far:

var page = require('webpage').create();
var url = "www.site_of_interest.com";
page.onResourceRequested = function(request) {
      console.log('Request ' + JSON.stringify(request, undefined, 4));
};
page.onResourceReceived = function(response) {
      console.log('Receive ' + JSON.stringify(response, undefined, 4));
};
page.open(url);
rwolst
  • 12,904
  • 16
  • 54
  • 75
  • possible duplicate of [How can I catch and process the data from the XHR responses using casperjs?](http://stackoverflow.com/questions/24555370/how-can-i-catch-and-process-the-data-from-the-xhr-responses-using-casperjs). Although this question is about CasperJS, most of the code is directly transferable to plain PhantomJS. – Artjom B. Oct 01 '14 at 23:46
  • Thanks for the reply, yes it looks like the answer may lie in the other question. I'll have a look into it. – rwolst Oct 02 '14 at 00:22

1 Answers1

1

Currently, this is not possible with PhantomJS. It does not expose the request/response content in those callbacks. Possible workarounds would be:

  • If the AJAX requests can be replayed (multiple requests to the same URL yield the same response every time), then you can make your own AJAX request in the onResourceReceived handler and save the response into a file using the fs module.
  • AJAX responses for the same URL would mean that some content changes in the page. You could write custom code to check the DOM for those changes and infer what the AJAX request might have been. It doesn't necessarily have to be DOM. Maybe the data is accessible in some JavaScript variable from the page context or it is saved in localStorage.
    It is also possible to write a custom XMLHttpRequest implementation as a proxy which saves the responses so that they can be grabbed. It must be injected before any page JavaScript runs. So the page.onInitialized handler works best.

I have written a post about those workarounds for CasperJS, but they can be easily converted to be used with plain PhantomJS: A: How can I catch and process the data from the XHR responses using casperjs?.

Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222