4

I need to efficently access a large gzipped xml file from javascript (actually from Greasemonkey). Unfortunately, the server doesn't provide a Content-Encoding header, and Content-Type is "application/x-gzip", so firefox will not (as far as I can tell) automatically inflate it. If there's a way to fake firefox out, that would be ideal. Short of that, I need some way to efficiently do the inflation...what I'm using now takes about 30 seconds to deflate the 1.2Mb gzipped file; I'd like to get it down under 5 seconds.

(The Greasemonkey script I'm working on can't have any other external server dependencies, so proxying and presenting a Content-Encoding header isn't an option.)

What I'm doing now, I've patched together from several places. To receive the binary data unmolested, I'm using the firefox XMLHTTPRequest overrideMimeType extension:

$.ajax(url, {
    dataType:'text',
    beforeSend:function(xhr){
        xhr.overrideMimeType('text/plain; charset=x-user-defined')
    },
    success:function(data){
        var blob='';
        for (i=0; i<data.length; ++i)
            blob += String.fromCharCode(data.charCodeAt(i) & 0xff);
        ...

Then inflating, using a slightly modified and inlined copy of https://github.com/dankogai/js-deflate/blob/master/rawinflate.js (there are several other javascript inflate libraries out there, all as far as I can tell based on an older library http://www.onicos.com/staff/iz/amuse/javascript/expert/inflate.txt). This is the horrifically slow part.

        // blithely assuming the gzip header won't change,
        // strip a fixed number of bytes from the front
        deflated=RawDeflate.inflate(blob.substring(22,blob.length-8));

Then popping it in an innerHTML property to parse it:

        xmlcontainer=$('<div>');
        // remove <?xml...> prolog
        xmlcontainer.html(deflated.substring(45));
        xmldoc=xmldoc.children();

(I know the last bit could be more properly done with DOMParser's parseFromString, but I didn't get that working yet.)

ysth
  • 96,171
  • 6
  • 121
  • 214
  • 2
    It's a shame no browser seems to expose their unzipping algorithms to Javascript - I imagine it would be half-way easy to do and the most efficient way – Pekka Mar 06 '11 at 20:26
  • You are probably aware of this, but for reference http://stackoverflow.com/questions/294297/javascript-implementation-of-gzip – Pekka Mar 06 '11 at 20:29
  • @Pekka, yes I had read that, and another question I can't find now that had more deflate-specific links. – ysth Mar 06 '11 at 20:54
  • 1
    @Pekka, I imagine the thought has probably never occurred to browser developers. I mean, what would you use it on? JS is not supposed to touch files and if things are set properly, the browser handles the rest automatically. It's hard to plan for outlier cases like this one. – Brock Adams Mar 06 '11 at 21:20

1 Answers1

1

You're just not going to be able to do noticeably better with this configuration**.

JavaScript is too slow to inflate as fast as desired, and you can't call a binary reliably from JS -- unless AJAXing the data to your own server (which can be the local PC).

Your options for improvement would seem to be:

  1. Get the browser to automatically inflate the content. If you have already tried using overrideMimeType to set application/x-gzip, you might try using GM_xmlhttpRequest instead (it's a long shot).

  2. Convert this from a GM script to a Firefox add-on. As an add-on, you can access binaries, like 7-Zip, and might even have access to the browser's inflate method. You could probably spoof the mimetype more easily too.



**I did notice some trivial opportunities to speed up that inflating JS... things like length checks inside for loops. Alas, given the particulars, probably not going to buy more than a second or two.

Brock Adams
  • 90,639
  • 22
  • 233
  • 295