15

I have this uncompressed byte array:

0E 7C BD 03 6E 65 67 6C 65 63 74 00 00 00 00 00 00 00 00 00 42 52 00 00 01 02 01
00 BB 14 8D 37 0A 00 00 01 00 00 00 00 05 E9 05 E9 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 81 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 05 00 00 01 00 00 00

And I need to compress it using the deflate algorithm (implemented in zlib), from what I searched the equivalent in C# would be using GZipStream but I can't match the compressed resulted at all.

Here is the compressing code:

public byte[] compress(byte[] input)
{
    using (MemoryStream ms = new MemoryStream())
    {
        using (GZipStream deflateStream = new GZipStream(ms, CompressionMode.Compress))
        {
            deflateStream.Write(input, 0, input.Length);
        }
        return ms.ToArray();
    }
}

Here is the result of the above compressing code:

1F 8B 08 00 00 00 00 00 04 00 ED BD 07 60 1C 49 96 25 26 2F 6D CA 7B 7F 4A F5 4A
D7 E0 74 A1 08 80 60 13 24 D8 90 40 10 EC C1 88 CD E6 92 EC 1D 69 47 23 29 AB 2A
81 CA 65 56 65 5D 66 16 40 CC ED 9D BC F7 DE 7B EF BD F7 DE 7B EF BD F7 BA 3B 9D
4E 27 F7 DF FF 3F 5C 66 64 01 6C F6 CE 4A DA C9 9E 21 80 AA C8 1F 3F 7E 7C 1F 3F
22 7E 93 9F F9 FB 7F ED 65 7E 51 E6 D3 F6 D7 30 CF 93 57 BF C6 AF F1 6B FE 5A BF
E6 AF F1 F7 FE 56 7F FC 03 F3 D9 AF FB 5F DB AF 83 E7 0F FE 35 23 1F FE BA F4 FE
AF F1 6B FC 1A FF 0F 26 EC 38 82 5C 00 00 00

Here is the result I am expecting:

78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03 4E 41 0C 0C 8C 4C 8C 0C BB 45 7A
CD B9 80 4C 90 18 EB 4B D6 97 0C 28 00 2C CC D0 C8 C8 80 09 58 21 B2 00 65 6B 08
C8

What I am doing wrong, could some one help me out there ?

Guapo
  • 3,446
  • 9
  • 36
  • 63
  • Why do you expect the same output from different implementations? There are many ways to compress some content that can be decompressed with the same decompressor. But in your case the zip stream seems to output some kind of header. – CodesInChaos Jun 08 '11 at 17:19
  • 1
    Not only is the GZipStream's result different, but it is bigger than the uncompressed input! –  Jun 08 '11 at 17:31
  • @Inuyasha that much I already understood which is why I am looking for how to make them iqual by trying to find out what I am doing wrong, as I mentioned I need to use the deflate implementation of zlib in C#. @CodeInChaos I did not know it was different implementation I was searching around SO and I found some replies stating that GZip was the equivalent for it, I did figure it out that it was not when I started testing it. – Guapo Jun 08 '11 at 17:41
  • Aside from the increased size, I assume there is another program decompresing this. How does that go? – H H Jun 08 '11 at 17:48

2 Answers2

35

First, some information: DEFLATE is the compression algorithm, it is defined in RFC 1951. DEFLATE is used in the ZLIB and GZIP formats, defined in RFC 1950 and 1952 respectively, which essentially are thin wrappers around DEFLATE bytestreams. The wrappers provide metadata such as, the name of the file, timestamps, CRCs or Adlers, and so on.

.NET's base class library implements a DeflateStream that produces a raw DEFLATE bytestream, when used for compression. When used in decompression it consumes a raw DEFLATE bytestream. .NET also provides a GZipStream, which is just a GZIP wrapper around that base. There is no ZlibStream in the .NET base class library - nothing that produces or consumes ZLIB. There are some tricks to doing it, you can search around.

The deflate logic in .NET exhibits a behavioral anomaly, where previously compressed data can actually be inflated, significantly, when "compressed". This was the source of a Connect bug raised with Microsoft, and has been discussed here on SO. This may be what you are seeing, as far as ineffective compression. Microsoft have rejected the bug, because while it is ineffective for saving space, the compressed stream is not invalid, in other words it can be "decompressed" by any compliant DEFLATE engine.

In any case, as someone else posted, the compressed bytestream produced by different compressors may not necessarily be the same. It depends on their default settings, and the application-specified settings for the compressor. Even though the compressed bytestreams are different, they may still decompress to the same original bytestream. On the other hand the thing you used to compress was GZIP, while it appears what you want is ZLIB. While they are related, they are not the same; you cannot use GZipStream to produce a ZLIB bytestream. This is the primary source of the difference you see.


I think you want a ZLIB stream.

The free managed Zlib in the DotNetZip project implements compressing streams for all of the three formats (DEFLATE, ZLIB, GZIP). The DeflateStream and GZipStream work the same way as the .NET builtin classes, and there's a ZlibStream class in there, that does what you think it does. None of these classes exhibit the behavior anomaly I described above.


In code it looks like this:

    byte[] original = new byte[] {
        0x0E, 0x7C, 0xBD, 0x03, 0x6E, 0x65, 0x67, 0x6C,
        0x65, 0x63, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x42, 0x52, 0x00, 0x00,
        0x01, 0x02, 0x01, 0x00, 0xBB, 0x14, 0x8D, 0x37,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x05, 0xE9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x81, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00,
        0x01, 0x00, 0x00, 0x00
    };

    var compressed = Ionic.Zlib.ZlibStream.CompressBuffer(original);

The output is like this:

0000    78 DA E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03     x...........\...
0010    4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F     NA...L...Ez.ab./
0020    19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07     ...FF,..@.@..5%.
0030    CE                                                  .

To decompress,

    var uncompressed = Ionic.Zlib.ZlibStream.UncompressBuffer(compressed);

You can see the documentation on the static CompressBuffer method.


EDIT

The question is raised, why is DotNetZip producing 78 DA for the first two bytes instead of 78 9C? The difference is immaterial. 78 DA encodes "max compression", while 78 9C encodes "default compression". As you can see in the data, for this small sample, the actual compressed bytes are exactly the same whether using BEST or DEFAULT. Also, the compression level information is not used during decompression. It has no effect in your application.

If you don't want "max" compression, in other words if you are very set on getting 78 9C as the first two bytes, even though it doesn't matter, then you cannot use the CompressBuffer convenience function, which uses the best compression level under the covers. Instead you can do this:

  var compress = new Func<byte[], byte[]>( a => {
        using (var ms = new System.IO.MemoryStream())
        {
            using (var compressor =
                   new Ionic.Zlib.ZlibStream( ms, 
                                              CompressionMode.Compress,
                                              CompressionLevel.Default )) 
            {
                compressor.Write(a,0,a.Length);
            }

            return ms.ToArray();
        }
    });

  var original = new byte[] { .... };
  var compressed = compress(original);

The result is:

0000    78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03     x...........\...
0010    4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F     NA...L...Ez.ab./
0020    19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07     ...FF,..@.@..5%.
0030    CE                                                  .
Community
  • 1
  • 1
Cheeso
  • 189,189
  • 101
  • 473
  • 713
  • @Cheeso I just tried ZLib.Net from Merlyn's reply and it works just fine to compress giving me the data I was expecting, now I just don't know how to decompress a byte array that I Have received. – Guapo Jun 08 '11 at 18:09
  • @Cheeso thanks It seem rather simple to uncompress using it I will give it a try since I am having some problems decompressing it from the other lib. – Guapo Jun 08 '11 at 18:44
  • @Cheeso DotNetZip always compress it with 1 different byte "78 DA" instead of "78 9C" at the very begin while when I use ZLib.Net it works fine giving me 9C instead of DA, removing that it works just fine to uncompress not sure why it changes 9C to DA yet... – Guapo Jun 08 '11 at 20:08
  • It doesn't really matter whether the 2nd byte is 9C or DA. ZLIB has a 2 byte header, the first byte indicates the compression method and window size if DEFLATE is used. It's always 78. The next byte varies, and indicates 3 things: whether a preset dictionary has been used, the compression level, and a checksum of sorts on the 1st two bytes. In effect, 9C indicates comp level "default" while DA indicates compression level "max". This information is not needed for decompression; it is interesting only if your app considers whether additional compression might be useful. You can ignore it. – Cheeso Jun 08 '11 at 22:34
  • The CompressBuffer convenience method specifies "Best Compression" which is why it is encoded as `78 DA` in the output buffer you see. – Cheeso Jun 08 '11 at 22:34
  • A bit additional background: DEFLATE is also used in the `zip` file format (for each archived file individually). – Paŭlo Ebermann Jun 30 '11 at 12:47
  • @Cheeso, I know this is an old thread, but your detailed answer encouraged me to ask you a couple of questions. When I tried using zlib in a C#, for in-memory (not file) compression, with small byte[]s around 500 bytes, I found the ratios rather inconsistent: 125=>116, 98=>90, 115=>113 (bytes before and after compression). This brings me to the question: Is this kind of inconsistency in ratio expected of all compression tools; or, does it have anything to do with small byte array inputs, and does it improve with larger inputs? Would be very glad to learn your thoughts on this. Many thanks. – Pradeep Puranik Dec 14 '17 at 14:08
0

Quite simply what you got had a GZip header. What you want is the simpler Zlib header. ZLib has options for GZip header, Zlib header or no header. Typically the Zlib header is used unless the data is associated with a disk file (in which case GZip header is used.) Apparently, there is no way with .Net library to write a zlib header (even though this is by far the most common header used in file formats). Try http://dotnetzip.codeplex.com/.

You can quickly test all the different zlib options using HexEdit (Operations->Compression->Settings). See http://www.hexedit.com . It took me 10 minutes to check your data by simply pasting your compressed bytes into HexEdit and decompressing. Also tried compressing your orignal bytes with GZip and ZLib headers as a double-check. Note that you may have to fiddle with the settings to get exactly the bytes you were expecting.

Andrew W. Phillips
  • 3,254
  • 1
  • 21
  • 24