1

I went through SO question in this field, but couldn't find what I was looking for.
I'm sending small binary files (~5MB) over narrow-band network, which should be pretty much similar and I want to compress them using zlib (python).
I would like to build a pre-defined dictionary, but standard common dictionaries are not relevant since it's a non-textual information.
Moreover, finding the common sequences manually is also not an easy job and would work only on this specific type of file.

I'm looking for a test-n-inspect method where I could just compress a file, and see the dictionary used for that output (the compressed data).
Then, by collecting those dictionaries I can run some tests to find the optimal.
Question is (after searching in zlib specification): how can I extract the dictionary from the compressed binary data?

I see that each compressed data starts with binary data then 2 \x00 bytes, then the data.
So I believe it's there, but how can I extract and use it? (or I'm not even close...)

(testing zlib with python 2.7)

RoeeK
  • 1,112
  • 12
  • 23
  • Already answered [here](http://stackoverflow.com/a/17619395/1180620). There is no "dictionary" stored in the compressed data. – Mark Adler Jun 11 '14 at 16:39
  • LZ77 is implemented on this dictionary. So I don't there's such thing "there's no dictionary". I checked zlib sources to see if I can put some prints in the LZ77 part.. Still working on it – RoeeK Jun 11 '14 at 21:17
  • LZ77 implements what is called a sliding "dictionary". It is no more than the last 32K bytes of uncompressed input. When people hear "dictionary", they think of some more complex data structure. There isn't one. If you have the uncompressed data, you have all of the "dictionaries" used. – Mark Adler Jun 12 '14 at 04:47

0 Answers0