2

Currently, I am developing an app that needs to store large amount of text on an iPad. My question is, are algorithms like Huffman coding actually used in production? I just need a very simple compression algorithm (there's not going to be a huge amount of text and it only needs a more efficient method of storage), so would something like Huffamn work? Should I look into some other kind of compression library?

Nick Anderegg
  • 1,006
  • 1
  • 12
  • 25
  • 14
    do you really think this "background" was needed to better understand the question? – akappa Jul 26 '11 at 19:13
  • "years of web design experience (about a decade)...programming since I was 12, which would mean I've been programming about six years". uh..okay? – user482594 Jul 26 '11 at 19:37
  • It wasn't exactly necessary. Is posting a comment asking if the background is needed really needed? – Nick Anderegg Jul 26 '11 at 19:44
  • 4
    @Nick Anderegg: yeah - maybe you'll realize it was useless and then you'll avoid including this kind of "background" in your questions in the future ;) – akappa Jul 26 '11 at 19:59
  • Used to be, but Huffman coding has largely been replaced by Arithmetic coding because it is the optimal coding for a stream based representational model. – RBarryYoung Mar 10 '14 at 23:23
  • See GPU huffman decoder on top of Metal here: https://stackoverflow.com/a/47954985/763355 – MoDJ Dec 25 '17 at 00:00

7 Answers7

12

From Wikipedia on the subject:

Huffman coding today is often used as a "back-end" to some other compression methods. DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3 have a front-end model and quantization followed by Huffman coding (or variable-length prefix-free codes with a similar structure, although perhaps not necessarily designed by using Huffman's algorithm).

So yes, Huffman coding is used in production. Quite a lot, even.

You
  • 22,800
  • 3
  • 51
  • 64
3

Huffman coding (also entropy coding) is used very widely. Anything you imagine that is being compressed, with exceptions of some very old schemes, uses them. Image compression, Zip and RAR archives, every imaginable codec and so on.

Keep in mind that Huffman coding is lossless and requires you to know all of the data you're compressing in advance. If you're doing lossy compression, you need to perform some transformations on your data to reduce its entropy first (removing and quantizing DCT coefficients in JPEG compression). If you want Huffnam coding to work on real-time data (you don't know every bit in advance), adaptive Huffman coding is used. You can find a whole lot on this topic in signal processing literature.

Some of the pre-Huffman compression include schemes like runlength coding (fax machines). Runlength coding is still sometimes used (JPEG, again) in combination with Huffman coding.

Phonon
  • 12,549
  • 13
  • 64
  • 114
3

Yes, they are used in production.

As others have mentioned, true Huffman requires you to analyze the entire corpus first to get the most efficient encoding, so it isn't typically used by itself.

Probably shortly after you were born, I implemented Huffman compression on the Psion Series 3 handheld computer in C in order to compress data which was preloaded onto data packs and only decompressed on the handheld. In those days, space was tight and there was no built-in compression library.

Like most software which is well-specified, I would strongly consider using any feature built into the iOS or standard packages available in your development environment.

This will save a lot of debugging and allow you to concentrate on the most significant portions of your app which add value.

Large amounts of text will be amenable to zip-style compression. And it will be unlikely that spending effort improving its performance (in either space or time) will pay off in the long run.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265
2

There's an iOS embedded mechanism to support zlib algorithm (zlib.h in Objective-C).

You may implement your own compression functionality and utilize iOS embedded zlib functions. And compare the performance.

I think the embedded zlib functionality will be faster and will give higher compression ratio.

1

Yes, you're using Huffman coding (decoding) to read this page, since web pages are compressed to the gzip format. So it is used pretty much every nanosecond by web browsers all over the planet.

Huffman coding is almost never used by itself, but rather always with some higher-order modeling of the data to give the Huffman algorithm what it needs to work with. So LZ77 models text and other byte-oriented data as repeating strings coded as literals and length/distance pairs, which is then fed to Huffman coding, e.g. in the deflate compressed format using zlib. Or with difference or other prediction coding of pixels for PNG, followed by Huffman coding.

As for what you should use, look at lz4, zlib, and zstd.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
1

Huffman codes are the backbone to many "real world" production algorithms. The common compression algorithms today improve upon Huffman codes by transforming their data to improve compression ratios. Of course, there are many application specific techniques used to do this.

As for whether or not you should use Huffman codes, my question is why should you when you can achieve better compression and ease of code by using an already implemented 3rd party library?

tskuzzy
  • 35,812
  • 14
  • 73
  • 140
  • Efficiency. Third party libraries may do a more thorough job of compression, but they also take longer to do it. I'd rather knock a 50kb article down to 40kb in 50ms rather than 15kb in 500ms. Space isn't a huge issue, but I'd like it to be virtually instant. – Nick Anderegg Jul 26 '11 at 19:46
  • 1
    Hurray for not blindly taking someone's advice! Yes the downside to better compression is a significant speed hit. It's certainly a balancing act. – tskuzzy Jul 26 '11 at 23:50
1

Yes, I'm using a huffman compression in my web app for storing a complete snapshot of my engine in an hidden input field. First off it was just curiosity but it offload my SESSION memory moving it to the client browser memory and i used it to store it in a file to backup and exchange that snapshot with my collegue. Man, you have to see their faces when you can just load a file in an admin panel to load the engine in the web!!! It's basically a serialized compressed and base64 encoded array. It helps me to save about 15% bandwith but I think I can do it better now.

Micromega
  • 12,486
  • 7
  • 35
  • 72