1

I need to be able to compress a string in Javascript, but without saving a temporary file. I am then going to send this compressed data via a POST. I will receive it in Python so I need to be able to decompress it there. I implemented the following, http://rosettacode.org/wiki/LZW_compression, only to discover that it only works on ascii-characters. I am going to be reading webpages and never know what characters I'll be getting.

(The reason I need to do this is because the strings can become quite long and therefore take too long for slow networks to post.)

user984003
  • 28,050
  • 64
  • 189
  • 285
  • Surely 1) reading data, 2) compressing data, 3) sending compressed data, 4) receiving and decompressing data is more work, then step 4 just being pointed to the data, and getting it, itself? – Jon Clements Oct 04 '12 at 16:11
  • Well JavaScript can not write to temp files so that "requirement" is basically impossible to do. – epascarello Oct 04 '12 at 16:11
  • Why not just send the URL back to Python and have it grab it. – tkone Oct 04 '12 at 16:11
  • 2
    Isn't your web server already doing gzip compression or similar? – korylprince Oct 04 '12 at 16:14
  • The javascript reads the DOM elements and sends that. It won't work to point to the source for various reasons, a main one being that I need the elements created by javascript on the page. I also need the computed style that the browser calculates for me. – user984003 Oct 04 '12 at 16:14
  • @epascarello: I don't need a temp file; I need it in memory. – user984003 Oct 04 '12 at 16:15
  • Take a look at http://jsend.org/ – NullUserException Oct 04 '12 at 16:27
  • 1
    @JonathanVanasco *"the web server will advertise it can do gzip/deflate and the browser will automatically compress and send"* This is not true. HTTP doesn't work that way; the server can send compressed data because it knows the client is capable of handling it. It doesn't work the other way around. – NullUserException Oct 04 '12 at 16:52
  • The LZW algorithm is byte, not ASCII-character oriented. Seems like you ought to be able to use if if you converted the data to a series of bytes first and did the reverse on the other end. – martineau Oct 04 '12 at 17:04
  • thanks @Null USer. i deleted my comment and upvoted yours. – Jonathan Vanasco Oct 04 '12 at 17:30

3 Answers3

2

You can try base64-encoding the string beforehand (this will yield a compressed stream from 1.5 to twice the size it would have if it had been possible to compress it directly).

There is another implementation (this of gzip Deflate algorithm) here.

Or you might try and escape the non-ASCII characters by replacing them with \xNN (NN = hex code of character). Of course you will also have to escape the slash .

Anyway, you are unlikely to achieve more than about a 10X increase in speed, and I fear this would be more than balanced by the encoding overhead. Without knowing more about the use case, I'd suggest going with Deflate.

LSerni
  • 55,617
  • 10
  • 65
  • 107
  • If you base64 encode, you should make sure that both languages are using the same character sets. Alternate characters are allowed for the 62,63 characters, which might hold special signficance. – Jonathan Vanasco Oct 04 '12 at 16:37
0

From OP comment.

The javascript reads the DOM elements and sends that. It won't work to point to the source for various reasons, a main one being that I need the elements created by javascript on the page. I also need the computed style that the browser calculates for me.

One solution would be to automate a browser using Selenium with Python and then retrieve the DOM from that.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
0

Use deflate in javascript and zlib in python. (LZW is ancient and obsolete -- modern methods are much better.) In between use base 85 encoding, picking 85 ASCII characters than experimentation or standards documentation indicate can make it through POST unscathed. Base 85 is simply where each character is a digit in a base 85 number, where five such digits encode 32 bits.

Community
  • 1
  • 1
Mark Adler
  • 101,978
  • 13
  • 118
  • 158