13

just built a small webapp for previewing HTML-documents that generates URL:s containing the HTML (and all inline CSS and Javascript) in base64 encoded data. Problem is, the URL:s quickly get kinda long. What is the "de facto" standard way (preferably by Javascript) to compress the string first without data loss?

PS; I read about Huffman and Lempel-Ziv in school some time ago, and I remember really enjoying LZW :)

EDIT:

Solution found; seems like rawStr => utf8Str => lzwStr => base64Str is the way to go. I'm further working on implementing huffman compression between utf8 and lzw. Problem so far is that too many chars become very long when encoded to base64.

bennedich
  • 12,150
  • 6
  • 33
  • 41

2 Answers2

6

Check out this answer. It mentions functions for LZW compression/decompression (via http://jsolait.net/, specifically http://jsolait.net/browser/trunk/jsolait/lib/codecs.js).

Community
  • 1
  • 1
David Murdoch
  • 87,823
  • 39
  • 148
  • 191
  • You sir have almost saved my day! Great library, although the base64 encoder wasn't to keen on encoding the lzw encoded string. – bennedich Nov 10 '10 at 13:43
  • I found an extended base64 encoder/decoder that works: http://www.webtoolkit.info/javascript-base64.html. In combination with the lzw-en-/decoder you linked to it all works. Thanks for your help! – bennedich Nov 10 '10 at 14:09
  • 6
    Page not found - womp womp – George Mauer Mar 28 '13 at 20:25
1

You will struggle to get very much compression at all on a URL, they're too short and don't contain enough redundant information to get much benefit from Huffman / LZW style algorithms.

If you have constraints on the space of possible URLS (e.g. all content tends to be in the same set of folders) you could hard code some parts of the URLS for expansion on the client - i.e. cheat.

James Gaunt
  • 14,631
  • 2
  • 39
  • 57
  • The HTML code to compress will be several thousand chars and contain alot of similiar chars. I believe/hope compression will make a significant difference. – bennedich Nov 10 '10 at 13:46
  • 1
    Ah Ok - so they really are kinda long! One other consideration - if you ensure GZIP compression is on for the HTML docs (i.e. via IIS) then you're getting compression anyway for the entire HTML document. In that case is compressing the URL before you encode and put them in the HTML redundant? Letting the browser do the decompression in code rather than you doing it in JS may be substantially quicker. – James Gaunt Nov 10 '10 at 13:55
  • Sorry I'm not fully following you yet. I just read about GZIP and it seems like a better choice than just LZW. Is there some native support for GZIP en-/decoding in browsers? Would a GZIP:ed string be safe to put straight into an URL? – bennedich Nov 10 '10 at 14:27
  • You can turn on GZIP compression on IIS. See http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/iis/25d2170b-09c0-45fd-8da4-898cf9a7d568.mspx. Then any HTML pages are GZIP'ed (or DEFLATE'ed) before they are sent to the browser if the browser supports it. The browser will uncompress when it receives the HTML. This may make your GZIP of a small section of the page redundant - and possibly detrimental to the size/speed of the page. – James Gaunt Nov 10 '10 at 15:28