2

I am using libLZF for compression in my application. In the documentation, there is a comment that concerns me:

lzf_compress might use different algorithms on different systems and
even different runs, thus might result in different compressed strings
depending on the phase of the moon or similar factors.

I plan to compare compressed data to know if the input was identical. Obviously if different algorithms were used then the compressed data would be different. Is there a solution to this problem? Possibly a way to force a certain algorithm each time? Or is this comment not ever true in practice? After all, phase of the moon, or similar factors is a little strange.

StaxMan
  • 113,358
  • 34
  • 211
  • 239
JaredC
  • 5,150
  • 1
  • 20
  • 45

2 Answers2

6

Decompress on the fly, then compare.

libLZF's web site states that "decompression [...] is basically at (unoptimized) memcpy-speed".

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • This is a nice thought, but I would prefer not to have to decompress on the fly. For various reasons, this would be very complicated since it's integrated into a distributed system. – JaredC Mar 15 '11 at 15:40
  • Unless you also control compression part, I think you really should not assume that same input must produce same compressed output; and should compress uncompressed data. Depending on exactly what your goal is, it might make more sense to require calculation of content checksum before (or during) compression, store that separately, and use that for comparisons. – StaxMan Mar 05 '13 at 19:10
6

The reason for the "moon phase dependency" is that they omit initialization of some data structures to squeeze out a little bit of performance (only where it does not affect decompression correctness, of course). Not an uncommon trick, as compression libraries go. So if you put your compression code in a separate, one-shot process, and your OS zeroes memory before handing it over to a process (all "big" OSes do but some smaller may not), then you'll always get the same compression result.

Also, take note of the following, from lzfP.h:

/*
 * You may choose to pre-set the hash table (might be faster on some
 * modern cpus and large (>>64k) blocks, and also makes compression
 * deterministic/repeatable when the configuration otherwise is the same).
 */
#ifndef INIT_HTAB
# define INIT_HTAB 0
#endif

So I think you only need to #define INIT_HTAB 1 when compiling libLZF to make it deterministic, though wouldn't bet on it too much without further analysis.

atzz
  • 17,507
  • 3
  • 35
  • 35
  • After some research, it looks like this does exactly what I needed. Not sure how I missed that in the documentation, but thanks a ton! – JaredC Mar 15 '11 at 18:59
  • @JaredC - thanks for sharing the conclusion of your analysis. – atzz Mar 16 '11 at 07:54