7

I am in the process of writing a Greasemonkey Script for pages in this site (Site1). Site1 has deals and offers of various kinds and my GM script aims to do the following:

When one visits an offer on Site1, the script queries Site2 to find out whether this hotel is also listed on Site2. If so, display the search results from Site2 on Site1.

The problem is that Site2 displays a progress bar ("Loading Results") and then displays the results. Thus my Ajax request always returns empty results and looks like this (See the red-boxed portion):
(Click for larger image)unfinished results


However, it should actually have the complete contents of the search results from Site2, like so:
(Click for larger image)finished results


I have tried a synchronous Ajax request as well as GM_xmlhttpRequest to no avail.

This is the problematic progress bar of Site 2:
(Click for larger image)status bar


How can I get the AJAX request to wait for the the search on Site2 to be completely rendered before returning the response to Site1?

For reference, my complete working script code is at pastebin.com.

This is the relevant snippet:

$(document).ready(function(){   
var rewardsSiteResults = $('<div class="panel deal-panel rc-lr"></div>').attr('id', "rewardsSiteResults")
        .html("<p>" + progressMessageText + "</p> ").append(spinnerGif);
$(insertSelector).after(rewardsSiteResults);

var addressMap = getAddressOfHotel();
var pinCode = addressMap[pinCodePlaceHolder];
var hotelName = addressMap[hotelNamePlaceHolder];
var queryURL = constructQueryURL(pinCode, hotelName);

$.ajaxSetup({async:true, timeout: 5000});
$.get(queryURL,null, function(response) {
    if(!displayed){
        displayed=true;
        //rewardsSiteResults.html("adfaasddsf");
        var text = $(response).find("#col2");
        $(text).find("script").remove();

        //console.log(text.html())
//          $('<iframe id="someId"/>').appendTo('#rewardsSiteResults')
//          .contents().find('body').append(response);
        rewardsSiteResults.html("<div class='panel deal-panel rc-lr'>" + text.html() +"</div>");
        //console.log(response);
    }
},'html');  
});
Brock Adams
  • 90,639
  • 22
  • 233
  • 295
Ashutosh Jindal
  • 18,501
  • 4
  • 62
  • 91
  • Think this is the sort of situation where you really need to be using a json api or some other api to allow you to use their search directly rather than essentially screen scraping. – Jon Taylor Jul 14 '12 at 18:27
  • 1
    +1 for nicely explaining the question – Blaster Jul 14 '12 at 18:27
  • Have u seen this post? http://stackoverflow.com/questions/3880307/trigger-event-on-body-load-complete-js-jquery – simgineer Jul 14 '12 at 18:27
  • 1
    I doubt there's a way to do this with ajax, which will receive only the initial response (the progress bar). I expect there's also no naturally occurring event that fires when the progress bar yields to the content you want. I think you have no other option than to load the content into a hidden iframe then to poll it, testing for the content to appear. – Beetroot-Beetroot Jul 14 '12 at 19:18

1 Answers1

11

In order for the AJAX get to "wait for the page to be rendered", it would actually have to fully process the page, fetching and running all the included CSS and javascript files. That's not easy and not recommended. Fortunately, you don't need to do that anyway.

Here are three better ways to approach this kind of problem:

  1. The resource page (mpdining.rewardsnetwork.com, for this question) might have an API. If it does, find it and use it. This is your best bet, if it's available.

  2. Analyze the resource page's javascript and/or AJAX requests. Use GM_xmlhttpRequest() to directly fetch just the payload data, instead of trying to parse the resource page.

    Sometimes this process is fairly easy, but some sites require complex interaction and/or authentication.

  3. Load the resource page in a hidden iframe; set your Greasemonkey script to run on both the resource page and the master page and to relay the desired data using postMessage().

    This approach will almost always work, although you may have to prevent some pages from attempting to "bust out" of the iframe.



Using a hidden iframe to get data from a cross-domain, resource page:

Greasemonkey scripts will run on both a normal page and on pages within iframes. In fact, you can set the same script to run on both, and on multiple domains.

If a master page and an iframed resource page are both running GM script(s), the script instances can communicate with each other, cross-domain, using postMessage().

For example, suppose we have a site, fiddle.jshell.net/9ttvF/show, that contains travel data, and we want to mash-up that site with matching data from a resource site, jsbin.com/ahacab, that uses AJAX to get its payload data.

The target (master) site looks like this:
target site

The resource site looks like this at first:
resource site, start

Then finishes like this: resource site, finish


The following script:

  1. Loads the resource page in a hidden iframe.
  2. Starts a second instance of itself running on the iframed page.
  3. Waits for the iframed page to finish, processing the results as desired.
  4. Sends the desired payload data to the GM script running on the target (master) page.
  5. The target-page's script then inserts the payload data to complete the mash-up.
// ==UserScript==
// @name     _Cross-site, AJAX scrape demo
// @include  http://fiddle.jshell.net/9ttvF/show/
// @include  http://jsbin.com/ahacab*
// @require  http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js
// @require  https://gist.github.com/raw/2625891/waitForKeyElements.js
// @grant    GM_addStyle
// ==/UserScript==

if (/fiddle\.jshell\.net/i.test (location.host) ) {
    console.log ("***Master-page start...");

    /*--- Inform the user.
    */
    $("#plainResults").before (
        '<div id="gmFetchRez">Greasemonkey is fetching results from jsbin.com...</div>'
    );

    /*--- Setup to process messages from the GM instance running on the iFrame:
    */
    window.addEventListener ("message", receiveMessage, false);

    /*--- Load the resource site in a hidden iframe.
    */
    $("body").append ('<iframe src="http://jsbin.com/ahacab" id="gmIframe"></iframe>');
}
else {
    console.log ("***Framed start...");
    /*--- Wait for the AJAXed-in content...
    */
    waitForKeyElements ("#results table.rezTable", sendResourcePageData);
}

function sendResourcePageData (jNode) {
    console.log ("Results found!  Sending them to the main window...");

    window.top.postMessage (jNode.html(), "*");
}

function receiveMessage (event) {
    if (event.origin != "http://jsbin.com")     return;

    $("#gmFetchRez").html (event.data);
}

//--- Use CSS to control appearances.
GM_addStyle ( "                                 \
    #gmIframe {                                 \
        display:            none;               \
    }                                           \
    #gmFetchRez {                               \
        background:         lightYellow;        \
        border:             3px double red;     \
        padding:            1em;                \
    }                                           \
" );

The final result looks like this, with the script installed and running: mashup result

Brock Adams
  • 90,639
  • 22
  • 233
  • 295
  • 2
    Brock .. before I say anything else .. please let me say this .. this is pure and undiluted awesomeness.. I thank you for the exquisite detail. Thank you so much. I missed mentioning that I the second site did not have an API and I had spent hours trying to understand it's resource requests with Firebug's assistance to no avail. The latter was complicated by the fact that the second site uses DWR. I am very excited about the third option, will report back as soon as possible. Thank you for taking the time out to compose such a great explanation. – Ashutosh Jindal Jul 15 '12 at 20:24
  • Brock, it works !!!! :) Thanks a lot !! I'll clean up my script a bit and then post it so that others can reference it ! Cheers ! – Ashutosh Jindal Jul 15 '12 at 21:06