16

Is it possible to use JavaScript to scrape all the changes to a webpage that is being updated live with AJAX? The site I wish to scrape updates data using AJAX every second and I want to grab all the changes. This is a auction website and several objects can change whenever a user places a bid. When a bid is placed the the following change:

The current Bid Price The current high bidder The auction timer has time added back to it

I wish to grab this data using a Chrome extension built on JavaScript. Is there a AJAX listener for JavaScript that can accomplish this? A tool kit? I need some direction. Can JavaScript accomplish this??

Rob W
  • 341,306
  • 83
  • 791
  • 678
user1885715
  • 191
  • 2
  • 8

1 Answers1

29

I'm going to show two ways of solving the problem. Whichever method you pick, don't forget to read the bottom of my answer!

First, I present a simple method which only works if the page uses jQuery. The second method looks slightly more complex, but will also work on pages without jQuery.

The following examples shows how you can implement filters based on method (eg POST/GET), URL, and read (POST) data and response bodies.

Use a global ajax event in jQuery

More information about the jQuery method can be found in the documentation of .ajaxSuccess. Usage:

jQuery.ajaxSuccess(function(event, xhr, ajaxOptions) {
    /* Method        */ ajaxOptions.type
    /* URL           */ ajaxOptions.url
    /* Response body */ xhr.responseText
    /* Request body  */ ajaxOptions.data
});

Pure JavaScript way

When the website does not use jQuery for its AJAX requests, you have to modify the built-in XMLHttpRequest method. This requires more code...:

(function() {
    var XHR = XMLHttpRequest.prototype;
    // Remember references to original methods
    var open = XHR.open;
    var send = XHR.send;

    // Overwrite native methods
    // Collect data: 
    XHR.open = function(method, url) {
        this._method = method;
        this._url = url;
        return open.apply(this, arguments);
    };

    // Implement "ajaxSuccess" functionality
    XHR.send = function(postData) {
        this.addEventListener('load', function() {
            /* Method        */ this._method
            /* URL           */ this._url
            /* Response body */ this.responseText
            /* Request body  */ postData
        });
        return send.apply(this, arguments);
    };
})();

Getting it to work in a Chrome extension

The previously shown code has to be run in the context of the page (in your case, an auction page). For this reason, a content script has to be used which injects (!) the script. Using this is not difficult, I refer to this answer for a detailled explanation plus examples of usage: Building a Chrome Extension - Inject code in a page using a Content script.

A general method

You can read the request body, request headers and response headers with the chrome.webRequest API. The headers can also be modified. It's however not (yet) possible to read, let alone modify the response body of a request. If you want this feature, star https://code.google.com/p/chromium/issues/detail?id=104058.

Community
  • 1
  • 1
Rob W
  • 341,306
  • 83
  • 791
  • 678
  • If you want to share the captured XHR data with your Chrome extension's Content Script and/or background page, see [Call background function of Chrome extension from a site](http://stackoverflow.com/questions/13777887/call-background-function-of-chrome-extension-from-a-site/13779769#13779769) – Rob W Dec 11 '12 at 08:39
  • Neither of these code snippets seem to work for me (in chrome extension). Has anyone got it to work? – K2xL Jan 20 '14 at 05:10
  • @K2xL Where did you put the snippet? It must be used in a script injected by a content script. – Rob W Jan 20 '14 at 09:10
  • Rob: I put it in the content script. It seems like content scripts run on their own sandboxes so you can't override the XHR object through them :-( – K2xL Jan 20 '14 at 18:58
  • @K2xL Please read my **whole** answer, especially the section at "Getting it to work in a Chrome extension"... – Rob W Jan 20 '14 at 19:44
  • @RobW Is there a way to get this work for firefox ? I am trying out this link https://developer.mozilla.org/en-US/Add-ons/SDK/Guides/Content_Scripts ut it does not seem to work, can you help ? – Anurag Jan 24 '14 at 14:49
  • @Codeanu You can also use the techniques at http://stackoverflow.com/a/9517879 to run content scripts in the page's context. Note that the Firefox Add-on API is much more powerful than Chrome's, consider looking in the Firefox-specific APIs to achieve this (please create a new question if you want more details on it. Of course, not before you've spent at least ten minutes on research). – Rob W Jan 24 '14 at 14:58
  • @RobW, Any idea why they're killing [declarativeWebRequest](https://bugs.chromium.org/p/chromium/issues/detail?id=586636#c3) (*without concrete plans to move to stable*)? – Pacerier Jun 15 '17 at 06:06
  • @Pacerier The DWR API is not going to be removed yet (your linked issue is marked WontFix). However, there might be a new version of the API: https://bugs.chromium.org/p/chromium/issues/detail?id=696822 – Rob W Jun 15 '17 at 09:20
  • 1
    I kept getting `ajaxSuccess is not a function` [Then I read the docs](http://api.jquery.com/ajaxsuccess/) "As of jQuery 1.9, all the handlers for the jQuery global Ajax events, including those added with the .ajaxSuccess() method, must be attached to document." So for 1.9 > you have to do `$(document).ajaxSuccess(function(event, xhr, ajaxOptions) { .....` – Wesley Smith Jul 14 '17 at 12:59