Retrieve only a part of remote URL in AJAX

Question

I have tried searching on Google and also read the documentation but no success. I am making the ajax request in contentScript (chrome extension) or otherwise known as greasemonkey script for firefox users.

A typical function to get a URL using AJAX,

function getURL(url, element)
{
    var request = new XMLHttpRequest();
    request.onreadystatechange = function()
    {   
        if ( request.readyState == 4 ) 
        {   
            callback( request.responseText, element, request.status );    
        }   
    };  
    request.open( "GET", url, true );
    request.send()
}

Lets say I only need first 10kb of the page but the whole size of page is more than 200kb. The page I am retrieving is a normal HTML. I don't want to waste the bandwidth by downloading the excess 190kb. Is there any way to achieve that? Also, if retrieving only a part of page from 100kb to 110kb possible?

I am open for browser specific solution (chrome). And I have to port the extension to Firefox too so ideas about that is also welcome.

"No". I dont think there is any way to get 'part' of a page by ajax. You will have to fetch the whole page. jQuey.load does this but as far as I know, it actually downloads the whole page and then filter out the desired content. — Jashwant, Mar 22 '12 at 10:48
PS. Content Script are **not** Greasemonkey scripts. See [this post](http://stackoverflow.com/a/9791647/938089?greasemonkey-require-does-not-work-in-chrome). — Rob W, Mar 22 '12 at 11:29
A browser-specific solution won't save you bandwidth - in order to save bandwidth the server needs to send you less data (meaning that the important part here is the server, not the browser). — Wladimir Palant, Mar 22 '12 at 13:31

score 7 · Answer 1 · answered Mar 22 '12 at 13:25

7

You can send a Range header:

request.setRequestHeader("Range", "bytes=0-9999");
request.send(null);

Note that the server might ignore this header and you will get the usual response back. In most cases the response will be "206 Partial Content" however with exactly 10000 bytes of data. Content-Range response header indicates which part of the file you've got, e.g. request.getResponseHeader("Content-Range") might give you bytes 0-9999/1234567 (here 1234567 is the total size of the file).

Obviously, you can also do request.setRequestHeader("Range", "bytes=100000-119999"); to get data from the middle of the file.

answered Mar 22 '12 at 13:25

Wladimir Palant

56,865
12
98
126

More information here: http://tools.ietf.org/html/draft-ietf-http-range-retrieval-00 – nisc Mar 22 '12 at 13:30
@Wladimir Adding this gives `Uncaught Error: INVALID_STATE_ERR: DOM Exception 11` on the line where I set the `Range header`. Any idea? – shadyabhi Mar 22 '12 at 13:58
nisc: ancient spec. Please stop citing it. The latest official spec is RFC 2616. – Julian Reschke Mar 22 '12 at 14:02
@WladimirPalant Thanks. Although the site I am working ignored the Range header for html pages, so I guess there is no way of doing it in my case. – shadyabhi Mar 22 '12 at 15:58
So you give a tick to the one that doesnt work and Rob W's answer that did work people give a -1 to?.... hilarious – PAEz Mar 22 '12 at 18:01
@WladimirPalant There was one small error with his code, but the concept completely works....I know because I have a test extension here right now that works. – PAEz Mar 22 '12 at 19:03
Real knowledge isnt based on beliefs, but facts ;P – PAEz Mar 22 '12 at 20:06

PAEz · Accepted Answer · 2012-03-22T19:57:12.403

Reposting Rob W's answer so there is a working example for this question.
The following code can be used to download the first 10k of a sites html as per the first part of the question...

Lets say I only need first 10kb of the page

function getURL(url, limit, callback) {
    var request = new XMLHttpRequest();
    request.onreadystatechange = function() {
        if ( request.readyState == 4 ) {
            if (request.responseText!=0) callback( request.responseText, request.status );
        } else if (request.responseText.length >= limit) {
            // If limit is exceeded
            var result = request.responseText;
            request.abort(); // Cancel request
            callback( result, request.status );
        }
    };
    request.overrideMimeType("text/html");
    request.open( "GET", url, true );
    request.send();
}

getURL('http://www.google.com.au', 100000, debug);
//getURL('http://paez.kodingen.com/testy.png', 100000, debug);

function debug(responseText, status) {
    console.debug('length of responseText '+responseText.length);
    console.debug('responseStatus : '+status);
    console.debug('responseText :\n'+responseText);
}

Note
It should be noted that this wont get exactly the size you specify as their is no way to say how often the readystate will be called. Also, I force it to be text otherwise their may not be a responseText.

Can you please elaborate `It should be noted that this wont get exactly the size you specify as their is no way to say how often the readystate will be called.`? Please elaborate why we need `request.responseText.length >= limit` part. I noticed that different sizes are being downloaded everytime I run the script. I know that you are saying something about this but I would appreciate a more elaborate explanation. Thanks. — shadyabhi, Mar 22 '12 at 20:31
If its not readyState==4 then its most likely a readyState==3 which is Loading. This happens regularly during the download of the file (I couldnt find anything saying how often it should happen). Each time it happens responseText will contain the currently downloaded data, so we keep checking to see if the length of it is past what you requested and then abort....hope that explains it. — PAEz, Mar 22 '12 at 20:35

Retrieve only a part of remote URL in AJAX

2 Answers2