9

The the other day I ran into the question GZipStream doesn't detect corrupt data (even CRC32 passes)? (Of which this might very well be a "duplicate", I have mixed feelings on the subject. I was also the one who added the CRC32 to the title, but in retrospect that feels out of place with the remainder of the post). After exploring the problem a bit on my own, I think that the issue is far greater than the other question initially portrays.

I expanded upon the other question and made the test code runnable under LINQPad and attempt to better showcase the CRC32 (Cyclic Redundancy Check) issue, if it does indeed exist. (Since the code is only a slight modification based upon the original it is possible that the test setup/methodology is flawed or there is another strange quirk/PEBCAK both.)

The results are odd because the corrupt data is not always causing an (any!) Exception to be raised. Note that only sometimes does the CRC32 check seem to actually be "working". The corrupt bytes that cause the index-out-of-range/bad header/bad footer can be ignored because we can assume these are killing the decompression prior to the CRC32 check (which is perfectly understandable, even if the IndexOutOfRangeException should likely be wrapped by a InvalidDataException) so,

Why is the CRC32 check significantly less reliable than it ought to be? (Why is it the case that there is "Invalid data (No Exception)" below?)

Since the GZip footer contains both the CRC32 and length of the uncompressed data is seems that the error detection rate should be "significantly higher" -- that is, I would not expect a single failing case below, much less a number of undetected corrupted streams. (Of course it's nice to detect a corrupt steam ASAP: but the final safe-guard checksum seems to be downright ignored in cases.)

Format is CorruptByteIndex+FailedDetections: Message:

0+0: System.IO.InvalidDataException:The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.
1+0: System.IO.InvalidDataException:The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.
2+0: System.IO.InvalidDataException:The compression mode specified in GZip header is unknown.
3+0: Good data (No Exception)
4+0: Good data (No Exception)
5+0: Good data (No Exception)
6+0: Good data (No Exception)
7+0: Good data (No Exception)
8+0: Good data (No Exception)
9+0: Good data (No Exception)
10+0: System.IO.InvalidDataException:Unknown block type. Stream might be corrupted.
11+1: Invalid data (No Exception)
12+1: System.IO.InvalidDataException:Found invalid data while decoding.
13+1: System.IO.InvalidDataException:Found invalid data while decoding.
14+1: System.IO.InvalidDataException:Found invalid data while decoding.
15+1: System.IO.InvalidDataException:Found invalid data while decoding.
16+1: System.IO.InvalidDataException:Found invalid data while decoding.
17+2: Invalid data (No Exception)
18+2: System.IO.InvalidDataException:Found invalid data while decoding.
19+2: System.IndexOutOfRangeException:Index was outside the bounds of the array.
20+2: System.IndexOutOfRangeException:Index was outside the bounds of the array.
21+3: Invalid data (No Exception)
22+3: System.IndexOutOfRangeException:Index was outside the bounds of the array.
23+3: System.IndexOutOfRangeException:Index was outside the bounds of the array.
24+4: Invalid data (No Exception)
25+4: System.IndexOutOfRangeException:Index was outside the bounds of the array.
26+4: System.IndexOutOfRangeException:Index was outside the bounds of the array.
27+4: System.IndexOutOfRangeException:Index was outside the bounds of the array.
28+4: System.IndexOutOfRangeException:Index was outside the bounds of the array.
29+5: Invalid data (No Exception)
30+5: System.IndexOutOfRangeException:Index was outside the bounds of the array.
31+6: Invalid data (No Exception)
32+7: Invalid data (No Exception)
33+7: System.IndexOutOfRangeException:Index was outside the bounds of the array.
34+7: System.IndexOutOfRangeException:Index was outside the bounds of the array.
35+7: System.IndexOutOfRangeException:Index was outside the bounds of the array.
36+8: Invalid data (No Exception)
37+8: System.IndexOutOfRangeException:Index was outside the bounds of the array.
38+8: System.IndexOutOfRangeException:Index was outside the bounds of the array.
39+9: Invalid data (No Exception)
40+9: System.IndexOutOfRangeException:Index was outside the bounds of the array.
41+9: System.IndexOutOfRangeException:Index was outside the bounds of the array.
42+10: Invalid data (No Exception)
43+10: System.IO.InvalidDataException:Found invalid data while decoding.
44+10: System.IndexOutOfRangeException:Index was outside the bounds of the array.
45+10: System.IO.InvalidDataException:Found invalid data while decoding.
46+11: Invalid data (No Exception)
47+11: System.IndexOutOfRangeException:Index was outside the bounds of the array.
48+11: System.IndexOutOfRangeException:Index was outside the bounds of the array.
49+11: System.IndexOutOfRangeException:Index was outside the bounds of the array.
50+12: Invalid data (No Exception)
51+12: System.IndexOutOfRangeException:Index was outside the bounds of the array.
52+12: System.IndexOutOfRangeException:Index was outside the bounds of the array.
53+13: Invalid data (No Exception)
54+13: System.IndexOutOfRangeException:Index was outside the bounds of the array.
55+14: Invalid data (No Exception)
56+14: System.IndexOutOfRangeException:Index was outside the bounds of the array.
57+15: Invalid data (No Exception)
58+15: System.IndexOutOfRangeException:Index was outside the bounds of the array.
59+15: System.IndexOutOfRangeException:Index was outside the bounds of the array.
60+16: Invalid data (No Exception)
61+17: Invalid data (No Exception)
62+18: Invalid data (No Exception)
63+19: Invalid data (No Exception)
64+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
65+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
66+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
67+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
68+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
69+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
70+19: System.IO.InvalidDataException:Found invalid data while decoding.
71+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
72+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
73+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
74+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
75+19: System.IO.InvalidDataException:Found invalid data while decoding.
76+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
77+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
78+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
79+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
80+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
81+19: System.IO.InvalidDataException:Found invalid data while decoding.
82+19: System.IndexOutOfRangeException:Index was outside the bounds of the array.
83+20: Invalid data (No Exception)
84+21: Invalid data (No Exception)
85+22: Invalid data (No Exception)
86+22: System.IndexOutOfRangeException:Index was outside the bounds of the array.
87+23: Invalid data (No Exception)
88+24: Invalid data (No Exception)
89+25: Invalid data (No Exception)
90+25: System.IndexOutOfRangeException:Index was outside the bounds of the array.
91+26: Invalid data (No Exception)
92+26: System.IO.InvalidDataException:Found invalid data while decoding.
93+26: System.IndexOutOfRangeException:Index was outside the bounds of the array.
94+27: Invalid data (No Exception)
95+27: System.IndexOutOfRangeException:Index was outside the bounds of the array.
96+27: System.IndexOutOfRangeException:Index was outside the bounds of the array.
97+28: Invalid data (No Exception)
98+28: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
99+28: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
100+29: Invalid data (No Exception)
101+30: Invalid data (No Exception)
102+31: Invalid data (No Exception)
103+32: Invalid data (No Exception)
104+32: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
105+33: Invalid data (No Exception)
106+34: Invalid data (No Exception)
107+35: Invalid data (No Exception)
108+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
109+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
110+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
111+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
112+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
113+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
114+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
115+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
116+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
117+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
118+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
119+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
120+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
121+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
122+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
123+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
124+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
125+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
126+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
127+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
128+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
129+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
130+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
131+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
132+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
133+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
134+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
135+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
136+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
137+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
138+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
139+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
140+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
141+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
142+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
143+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
144+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
145+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
146+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
147+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
148+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
149+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
150+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
151+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
152+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
153+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
154+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
155+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
156+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
157+35: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
158+36: Invalid data (No Exception)
159+36: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
160+36: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
161+37: Invalid data (No Exception)
162+38: Invalid data (No Exception)
163+39: Invalid data (No Exception)
164+40: Invalid data (No Exception)
165+41: Invalid data (No Exception)
166+41: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
167+41: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
168+41: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
169+41: System.IO.InvalidDataException:The CRC in GZip footer does not match the CRC calculated from the decompressed data.
170+41: System.IO.InvalidDataException:The stream size in GZip footer does not match the real stream size.
171+41: System.IO.InvalidDataException:The stream size in GZip footer does not match the real stream size.
172+41: System.IO.InvalidDataException:The stream size in GZip footer does not match the real stream size.
173+41: System.IO.InvalidDataException:The stream size in GZip footer does not match the real stream size.

Here is the test which is copy'n'paste runnable in LINQPad (for .NET 3.5 and 4, use "as C# statements" mode):

   string sample = "This is a compression test of microsoft .net gzip compression method and decompression methods";
   var encoding = new ASCIIEncoding();
   var data = encoding.GetBytes(sample);
   string sampleOut = null;
   byte[] cmpData;

   // Compress 
   using (var cmpStream = new MemoryStream())
   {
      using (var hgs = new System.IO.Compression.GZipStream(cmpStream, System.IO.Compression.CompressionMode.Compress))
      {
         hgs.Write(data, 0, data.Length);
      }
      cmpData = cmpStream.ToArray();
   }

   int corruptBytesNotDetected = 0;

   // corrupt data byte by byte
   for (var byteToCorrupt = 0; byteToCorrupt < cmpData.Length; byteToCorrupt++)
   {
      var corruptData = new List<byte>(cmpData).ToArray();
      // corrupt the data
      corruptData[byteToCorrupt]++;

      using (var decomStream = new MemoryStream(corruptData))
      {
         using (var hgs = new System.IO.Compression.GZipStream(decomStream, System.IO.Compression.CompressionMode.Decompress))
         {
            using (var reader = new StreamReader(hgs))
            {
               string message;
               try
               {
                  sampleOut = reader.ReadToEnd();

                  // if we get here, the corrupt data was not detected by GZipStream
                  // ... okay so long as the correct data is extracted

                  if (!sample.SequenceEqual(sampleOut)) {
                    corruptBytesNotDetected++;
                    message = "Invalid data (No Exception)";
                  } else {
                    message = "Good data (No Exception)";
                  }
               }
               catch(Exception ex)
               {
                    message = (ex.GetType() + ":" + ex.Message);
               }
               string.Format("{0}+{1}: {2}",
                    byteToCorrupt, corruptBytesNotDetected, message).Dump();
            }
         }
      }

   }

Here is the compressed data in .NET 3.5 (GZipStream is notoriously bad at "compressing" small payloads but it is a "Won't Fix" issue because the stream is still technically valid):

1F 8B 08 00 00 00 00 00 04 00 ED BD 07 60 1C 49 96 25 26 2F
6D CA 7B 7F 4A F5 4A D7 E0 74 A1 08 80 60 13 24 D8 90 40 10
EC C1 88 CD E6 92 EC 1D 69 47 23 29 AB 2A 81 CA 65 56 65 5D
66 16 40 CC ED 9D BC F7 DE 7B EF BD F7 DE 7B EF BD F7 BA 3B
9D 4E 27 F7 DF FF 3F 5C 66 64 01 6C F6 CE 4A DA C9 9E 21 80
AA C8 1F 3F 7E 7C 1F 3F 22 DE CC 8B 26 A5 FF 65 E9 B4 5A AC
EA BC 69 8A 6A 99 B6 79 D3 A6 D5 79 BA 28 A6 75 D5 54 E7 6D
3A 5E E6 6D 7A F1 83 62 15 B4 5B E4 ED BC 9A A5 D9 72 96 CE
F2 FE 17 CD FF 03 5C 51 5E 27 5E 00 00 00

(And, just for giggles, in .NET 4 it generates a slightly larger/different compressed stream.)

1F 8B 08 00 00 00 00 00 04 00 EC BD 07 60 1C 49 96 25 26 2F
6D CA 7B 7F 4A F5 4A D7 E0 74 A1 08 80 60 13 24 D8 90 40 10
EC C1 88 CD E6 92 EC 1D 69 47 23 29 AB 2A 81 CA 65 56 65 5D
66 16 40 CC ED 9D BC F7 DE 7B EF BD F7 DE 7B EF BD F7 BA 3B
9D 4E 27 F7 DF FF 3F 5C 66 64 01 6C F6 CE 4A DA C9 9E 21 80
AA C8 1F 3F 7E 7C 1F 3F 22 DE CC 8B 26 A5 FF 65 E9 B4 5A AC
EA BC 69 8A 6A 99 B6 79 D3 A6 D5 79 BA 28 A6 75 D5 54 E7 6D
3A 5E E6 6D 7A F1 83 62 15 B4 5B E4 ED BC 9A A5 D9 72 96 CE
F2 FE 17 CD FF 13 00 00 FF FF 5C 51 5E 27 5E 00 00 00

Additional notes:

The test may be subtly flawed in this case. When the GZipStream "fails to detect corruption" (no Exception) then the data read from the StreamReader is "" (an empty string): In that case, why does ReadToEnd() not raise an Exception (IOException or otherwise)?

Is it thus not GZipStream but rather the StreamReader that is "quirky" here or is this still an issue with the GZipStream (for not throwing an Exception)? Is there some correct way to handle this use-case reliably? (Consider when the input stream from the current position really is empty.)

Community
  • 1
  • 1
  • You can detect all missing exceptions by checking for a zero return value from GZipStream.Read(). You can report the bug at the connect.microsoft.com portal. – Hans Passant Feb 28 '12 at 02:13

1 Answers1

14

Preface

.NET [4 and previous] users should not use the Microsoft-provided GZipStream or DeflateStream classes under any circumstances, unless Microsoft replaces them completely with something that works. Use the DotNetZip library instead.

Update to Preface

The .NET Framework 4.5 and later have fixed the compression problem, and GZipStream and DeflateStream use zlib in those versions. I do not know if the CRC problem referenced below has been fixed.

Another Update

The CRC bug is not only not fixed, but Microsoft has decided that they won't fix it!

 

My response in Why does my C# gzip produce a larger file than Fiddler or PHP? shows that this behavior does not reflect a proper implementation of gzip corruption detection. In all of the cases tested, the error would be detected by a proper implementation. (That response also notes why seven of the cases result in good data.)

Another question is: how does this "compression" method produce 174 bytes of output from a 94-byte string? Especially seeing as how the string is compressible -- gzip reduces it to 84-bytes, even with the overhead of an 18-byte header and trailer. Can you provide a hex dump of that 174 bytes?

Community
  • 1
  • 1
Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • I am glad you responded here, especially as your other answer showed that there is *no reason* why this corruption shouldn't be detected. The GZipStream in .NET is notoriously bad for small payloads and, while it produces *valid* output, it is not necessarily *efficient* (space) output, but that's a different issue :) I have updated my post to include the hex result. (I am also counting 178 bytes, not 174 :-/) –  Feb 27 '12 at 21:15
  • Oh, in .NET 3.5 it's 174 bytes, 178 in .NET 4. Going backwards :p –  Feb 27 '12 at 21:21
  • 6
    The output in both cases are indeed valid gzip streams. However in both cases the GZipStream method is using dynamic blocks when it should be using a static block. A static block in the deflate format uses a fixed set of Huffman tables, and so the block header is only a few bits. A dynamic block uses custom-generated Huffman codes for the data in that block, and so requires around an 80-byte header to convey those codes. A dynamic block should only be used when it is shorter than a static block would be on the same data, which is normally for much longer blocks. That explains the size. – Mark Adler Feb 28 '12 at 05:50
  • That is yet another bug that should be reported for this method. It does not compress short blocks of data properly. – Mark Adler Feb 28 '12 at 05:52
  • They should just be using zlib for this method, which does all this properly (compression and corruption detection). – Mark Adler Feb 28 '12 at 05:54
  • Thanks. I ran into your other answer the other week - seeing the [now?] bold text that I copied over finally made it sink in: The implementation is broken. Don't use it. Move on. –  Mar 10 '13 at 21:25