22

I have a large file of size 500 mb to compress in a minute with the best possible compression ratio. I have found out these algorithms to be suitable for my use.

  1. lz4
  2. lz4_hc
  3. snappy
  4. quicklz
  5. blosc

Can someone give a comparison of speed and compression ratios between these algorithms?

eCorners
  • 67
  • 8
Sayantan Ghosh
  • 998
  • 2
  • 9
  • 29

4 Answers4

38

Yann Collet's lz4, hands down.

enter image description here

aalaap
  • 4,145
  • 5
  • 52
  • 59
  • 1
    What is you recommendation for embedded systems? which is the efficient compression and decompression algorithm regarding and space and time. – Buddhika Chaturanga Jan 19 '18 at 04:21
  • 1
    Often people don't know about the large-window brotli and perform large corpus benchmarking with the small-window brotli. Brotli's HTTP content encoding variant is small-window to allow decompression on cheap mobile phones. Other compressors (particularly so lzma and zstd) don't do that limitation and should be compared with large-window brotli, not small-window brotli. Typically you can see 10 % density improvements (within 0.6 % of lzma) using the large-window brotli, while keeping the high decompression speed. – Jyrki Alakuijala Mar 27 '19 at 15:25
5

This migth help you: (lz4 vs snappy) http://java-performance.info/performance-general-compression/ (benchmarks for lz4, snappy, lz4hc, blosc) https://web.archive.org/web/20170706065303/http://blosc.org:80/synthetic-benchmarks.html (now not available on http://www.blosc.org/synthetic-benchmarks.html)

Master M
  • 221
  • 2
  • 6
2

If you are only aiming for high compression density, you want to look at LZMA and large-window Brotli. These two algorithms give the best compression density from the widely available open-sourced algorithms. Brotli is slower at compression, but ~5x faster at decompression.

2

Like most questions, the answer usually ends up being: It depends :)

The other answers gave you good pointers, but another thing to take into account is RAM usage in both compression and decompression stages, as well as decompression speed in MB/s.

Decompression speed is typically inversely proportional to the compression ratio, so you may think you chose the perfect algorithm to save some bandwidth/disk storage, but then whatever is consuming that data downstream now has to spend much more time, CPU cycles and/or RAM to decompress. And RAM usage might seem inconsequential, but maybe the downstream system is an embedded/low-voltage system? Maybe RAM is plentiful, but CPU is limited? All those things need to be taken into account.

Here's an example of a suite of benchmarks done on various algorithms, taking a lot of these considerations into account:

https://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO

mjuarez
  • 16,372
  • 11
  • 56
  • 73
  • That was one of the interesting things about H265 vs H264 as well. H264 encodes faster, but results in a larger size. H265 results in a smaller result, and the decoder, while more complex, actually runs at similar speeds because the resulting data it needs to work on is significantly less. – Pyro Nov 27 '20 at 16:20