Deflate and inflate for PDF, using zlib C++

Question

I am trying to implement the "zlib.h" deflate and inflate functions to compress and decompress streams in PDF-file. Input: compressed stream from PDF-file. I implemented inflate function -- it's all right, I have uncopressed stream, after that I try to compress this stream again with deflate function, as output I have compressed stream, but it is not equal to input compressed stream and they are not equal to the length. What I'm doing wrong? This is a part of my code:

     size_t outsize = (streamend - streamstart) * 10;
            char* output = new char[outsize]; ZeroMemory(output, outsize);

            z_stream zstrm; ZeroMemory(&zstrm, sizeof(zstrm));
            zstrm.avail_in = streamend - streamstart + 1;
            zstrm.avail_out = outsize;
            zstrm.next_in = (Bytef*)(buffer + streamstart);//block of date to infalte 
            zstrm.next_out = (Bytef*)output; 

            int rsti = inflateInit(&zstrm);
            if (rsti == Z_OK)
            {
                int rst2 = inflate(&zstrm, Z_FINISH);
                if (rst2 >= 0)
                {
                    cout << output << endl;//inflated data
                }
            }

            char* deflate_output = new char[streamend - streamstart];           
            ZeroMemory(deflate_output, streamend - streamstart);
            z_stream d_zstrm; ZeroMemory(&d_zstrm, sizeof(d_zstrm));

            d_zstrm.avail_in = (uInt) (strlen(output)+1);
            d_zstrm.avail_out = (uInt) (streamend - streamstart);
            d_zstrm.next_in = (Bytef*)(output);
            d_zstrm.next_out = (Bytef*)(deflate_output);
            int rsti1 = deflateInit(&d_zstrm, Z_DEFAULT_COMPRESSION);

            if (rsti1 == Z_OK)
            {
                int rst22 = deflate(&d_zstrm, Z_FINISH);
                out << deflate_output << endl << "**********************" << endl;
//I try to write deflated stream to file
                printf("New size of stream: %lu\n", (char*)d_zstrm.next_out - deflate_output);
            }

But does it then decompress properly? If someone else compressed the data in the first place, it won't necessarily compress to the same thing, because there is more than one way to compress using the pattern matching method. — Weather Vane, Feb 11 '16 at 09:46
@WeatherVane, yes, decompression is correct on 100%, after decompression streams contain initial structure if pdf-file. — Diana, Feb 11 '16 at 10:07
Bear in mind too, that there can be different levels of compression - trading speed for size. — Weather Vane, Feb 11 '16 at 10:25
@WeatherVane Yes, sure. I tried different compression levels, but no one gave the same result :( — Diana, Feb 11 '16 at 10:29
Why should they? The original compression may have been done with a different library. — Weather Vane, Feb 11 '16 at 10:31
@WeatherVane The original filter for compression is FlateDecode, I use similar function to decompress it and it works. So if the original library is another, why decompression is correct? — Diana, Feb 11 '16 at 11:12
Because it still follows the rules. It is lossless coding. Suppose you want to compress `"aaaa123456789aaa"`. There are two possible pattern matches for the final `"aaa"` and either is correct. — Weather Vane, Feb 11 '16 at 15:07
@WeatherVane oh...thanks for your help. But now I don't know what to do.Streams in my PDF file were compressed by Flate(gzip-compression). Do you have any idea how I can compress it correctly, and can I use zlib for this goal? — Diana, Feb 11 '16 at 16:15
What do you mean by "correctly"? You already said the new compression decompresses properly. So it is a correct, but different compression. — Weather Vane, Feb 11 '16 at 16:17
@WeatherVane My goal is to decompress streams of PDF-file, edit them by special way and compress it again in similar way, that this pdf-file can be read by viewers/editors correctly. But compression Deflate() makes streams and file broken for readers. — Diana, Feb 11 '16 at 16:22
So sorry I misunderstood. I thought you said it works properly, but why is it different. — Weather Vane, Feb 11 '16 at 16:24
@WeatherVane Yes, from a technical point of view this method is properly, but apparantly compression in PDF is another type of Deflate()... — Diana, Feb 11 '16 at 16:33
@WeatherVane but decompressed clear compressed stream (as string) of file and compressed string of my programm (they are not equal), results are similar. — Diana, Feb 11 '16 at 16:38
PDF uses various types of compression, [see this](http://www.prepressure.com/pdf/basics/compression), perhaps the original is using LZW. — Weather Vane, Feb 11 '16 at 17:18
@WeatherVane ok, thank you :) I'll try to use this algorithm) — Diana, Feb 11 '16 at 18:51

score 3 · Accepted Answer · answered Feb 12 '16 at 07:15

There is nothing wrong. There is not a unique compressed stream for a given uncompressed stream. All that is required is that the decompression give you back exactly what was compressed (hence "lossless").

It may simply be caused by different compression parameters, different compression code, or even a different version of the same compression code.

If you can't reproduce the original compressed data, so what? All that matters is that you can make a valid PDF file that can be decompressed and has the content that you want.

Deflate and inflate for PDF, using zlib C++

1 Answers1

Linked