6

I was updating a project's code from a 1998 version of zlib to a 2013 version of zlib. One thing that seemed to change is that there used to be a "use_crc" flag on the uncompress function, which appeared to have gone missing:

int ZEXPORT uncompress (dest, destLen, source, sourceLen, use_crc)
    Bytef *dest;
    uLongf *destLen;
    const Bytef *source;
    uLong sourceLen;
    int use_crc; // <-- vanished (?)

(UPDATE: as pointed out by @Joe, this is likely a third-party modification. Title updated accordingly. The rest of the question is still applicable, as in, "how should I best do this with today's stock zlib".)

In the code I'm studying, uncompress() is being called by something that deconstructs the binary format of a .zip and passes in a "payload" of data. The code had been passing the crc flag in as 1. If the flag was not used, it would get a Z_DATA_ERROR (-3). (A zlib with no use_crc flag gets Z_DATA_ERROR just as if the flag had been false.)

In experiments, I found that very small files worked without use_crc. Then that the small counting files crossed over to not-working between "12345678901234" and "123456789012345". Reason was: that's the first file which was deflated instead of stored uncompressed (at what zip called a savings of "6%")

In floundering with options to get zlib to accept it, I tried many things. That included trying the 16 + MAX_WBITS. Nothing seemed to process the payload out of zip test.zip test.txt the way the old code had.

If I was willing to subtract one out of my destination size, I seemed to be able to suppress the erring check...at the loss of one byte. Here's the simple test program with the minimal zip payload hardcoded:

#include <stdio.h>
#include "zlib.h"

int main(int argc, char *argv[]) {
    char compressed[] = { 0x78, 0x9C, 0x33, 0x34, 0x32, 0x36, 0x31, 0x35, 0x33,
        0xB7, 0xB0, 0x34, 0x30, 0x04, 0xB1, 0xB8, 0x00, 0x31, 0x30, 0xB1, 0x30,
        0x10, 0x00, 0x00, 0x00 }; // last 4 bytes are size (16)

    char uncompressed[16 + 1]; // account for null terminator
    int ret; z_stream strm;

    memset(uncompressed, 'X', 16);
    uncompressed[16] = '\0';

    strm.zalloc = strm.zfree = strm.opaque = Z_NULL;
    strm.total_out = 0;
    strm.avail_in = 25;
    strm.next_in = compressed;

    ret = inflateInit2(&strm, MAX_WBITS /* + 16 */); // it is Z_OK

    strm.avail_out = 15; // 16 gives error -3: "incorrect header check" 
    strm.next_out = uncompressed;
    ret = inflate(&strm, Z_NO_FLUSH);

    if (ret != /* Z_STREAM_END */ Z_OK) { // doesn't finish... 
        printf("inflate() error %d: %s\n", ret, strm.msg);
        return 2;
    }

    inflateEnd(&strm);
    printf("successful inflation: %s\n", uncompressed);
    return 0;
}

The output is:

successful inflation: 123456789012345X

Showing the data is getting uncompressed, but we need all 16 bytes. (There's a newline in there from the file that should be received.) 16 + MAX_WBITS can't even get that.

Any ideas what's going wrong? No permutation of settings seems to get there without errors.

Community
  • 1
  • 1
  • 3
    What version of zlib is this? I just did some poking around through https://github.com/madler/zlib/blob/v1.1.1/uncompr.c and looking at different tag versions and can't find any `use_crc` ref. Possible this was a patch to official zlib by some 3rd party? – Joe Oct 02 '15 at 12:54
  • @Joe It could well be unofficial, good point. It's in the Rebol codebase, which had an extracted file: [u-zlib.c](https://github.com/rebol/rebol/blob/25033f897b2bd466068d7663563cd3ff64740b94/src/core/u-zlib.c#L2063). There were no notes about the addition, but looking at it on GitHub it does have an unusual tabbing, suggesting you're likely right. My real goal is to get that binary decoded, so I'll update the question. – HostileFork says dont trust SE Oct 02 '15 at 13:02
  • 2
    I think the best answer is to not even try to port that feature to a newer zlib. Treat these files as if they were not zlib at all (which actually they're not, since they are generated by a hacked zlib). Pretend the only way to uncompress them is with the same tool that compressed them. You linked to the github. Uncompress your files with rebol, save them in a more reasonable format, and don't repeat the mistake of using rebol (or its zlib (or its use_crc flag)) in the future –  Oct 02 '15 at 15:56
  • @WumpusQ.Wumbley Rebol doesn't produce this format. This is data which was being passed in by a zip file processor, and that file was made via **zip test.zip test.txt** as I said in the question. Seems I should apply more scrutiny to the processor now. The reason for not doing so before is I didn't write any of this, someone asked me to figure it out, "it used to work and now doesn't when linking to 2013 zlib". As I told MarkAdler, I'd assumed the previous working decoding of a .zip made by zip suggested the binary blob was a zip payload, I'll look and find out why you guys say it's not. – HostileFork says dont trust SE Oct 02 '15 at 18:45
  • Since you linked to rebol to show us the use_crc flag, I assumed that's what you were actually using. Anyway if your actual source files are in zip format, why not use a zip tool and skip all this zlib business? Or does it fail to conform to the zip format too? –  Oct 02 '15 at 20:23
  • @WumpusQ.Wumbley Rebol is being used to decompress, and zip to compress. The goal is actually to make Rebol work as a decompression tool without involving any other executable, given Rebol's [be small](http://rebolsource.net/) cross-platform goal...plus it has zlib living inside it *(and if I can get the tweak to work, the official zlib).* Based on the advice here I've gone to the zip container decoder and found it's the actual smoking gun. Rebol's sin was only adding a CRC32 checker it seems, but the client is making stuff up out of whole cloth. Just no error on it before and now there is. – HostileFork says dont trust SE Oct 02 '15 at 20:28

1 Answers1

5

No, there have been no incompatible changes to the zlib interface since it was introduced over 20 years ago. There was never a use_crc argument to uncompress().

The example you give is a two-byte zlib header, deflate-compressed data, the CRC-32 of the deflate data in big-endian order, followed by a four-byte length in little-endian order. This is a truly odd mash up of the zlib and gzip wrappers , and has nothing whatsoever to do with the zip format, which you keep mentioning. (What do you mean "payloads inside of zip files"?) zlib has an Adler-32 at the end in big-endian order whereas gzip has a CRC-32 in little-endian order followed by a four-byte length in little-endian order. This one mixes those up, including the byte ordering, and then deliberately misleadingly puts a valid zlib header on the thing, which is an affront to all that is good and decent in this world.

I'm pretty sure that whoever came up with this format was drunk at the time.

In order to decode this you will need to:

  1. Discard the first two bytes of the stream. (You can check that it is a valid zlib header, but that turns out to be meaningless in interpreting the rest of the stream.)

  2. Use raw deflate, initializing with inflateInit2(&strm, -15), to decompress the data. As you decompress, keep track of the total length and compute the CRC-32 using crc32().

  3. After the deflate data completes, read the next four bytes, assemble them in big-endian order to a 32-bit value, and compare that to the CRC-32 you computed. If it does not match, the stream is corrupted, or it is not one of these oddly formatted streams. (Maybe try again, decoding it as a normal zlib stream. If it had a good zlib header, then maybe that's what it actually is, as opposed to one of these Frankenstein streams.)

  4. Read the next four bytes and assemble those in little-endian order, and compare that to length of the uncompressed data. If it does not match, then the stream is corrupted, or it's not what you think.

  5. If the data does not end here, then something else odd is going on. Consult the drunk person.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • 3
    And please, please, please do not modify zlib to process this atrocity, and if you do, for gods sake, never, ever distribute such a thing. zlib can be used as is to process this data, as I have described. – Mark Adler Oct 02 '15 at 16:13
  • Glad to hear from The Man himself! *(But bear in mind, I've never dealt with zlib before, and the one who got volunteer-roped for "the good work" of "atrocity elimination" by trying to gut the code and link a new canon zlib. If you want to yell at someone [write this guy](https://en.wikipedia.org/wiki/Carl_Sassenrath).)* But when I say "payload inside a zip file" I mean that this is what was passed to 'uncompress()"w/use_crc by a zip reader. Sounds like you're saying I need to go look at the zip reader source that provided this, and I will go do that armed with this info and report back. – HostileFork says dont trust SE Oct 02 '15 at 18:30
  • The situation is exactly as you described verbatim. The piece of code that was unpacking the .ZIP just made that up. Getting rid of all that and just using -15 worked. Thanks for the pointers and prompt help! – HostileFork says dont trust SE Oct 05 '15 at 17:51