1

I came to the Q&A Is there a way to know if the byte[] has been compressed by gzipstream? and some author states (and it's true) that GZipStream puts {0x1f, 0x8b, 8, 0, 0, 0, 0, 0, 4, 0} characters as header to know if a byte array is a compressed string.

And my question is, is GZipStream header reliable across .NET versions?

Community
  • 1
  • 1
Matías Fidemraizer
  • 63,804
  • 18
  • 124
  • 206

3 Answers3

5

With any GZip format stream you are guarnateed:

First two bytes: 1f, 8b

Next byte: 00 for store (no compression), 01 for compress algorithm, 02 for pack, 03 for lzf and 08 for deflate. .NET so-far always uses deflate and many situations expect only deflate (only deflate-based gzip is expected by web clients as a transfer or content encoding marked as gzip) so it would be unlikely to change without some sort of option to specify it being added.

The next is the file type, with 00 meaning "probably some sort of text file" Since GZipStream has no information on the file type, it always uses that.

The next four are file-modification time in Unix format. Again, since the class has no information about the file–as it receives a stream, not a file with metadata, these are always set to 0.

The next byte depends on the compression method. With deflate it could be 2 to indicate heavy compression or 4 to indicate light compression.

The next (last in your sequence) depends on the OS type in use. 0 means "FAT Filesystem" but has continued to be used by Windows as Windows has moved to use other file systems like NTFS. It could potentially have a different value if used with Mono on a non-Windows file system, though that situation could also potentially decide to match the .NET behaviour. (Update: At least some versions of Mono will set the file-system flag to something other than 0 on non-Windows systems).

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
  • According to [this link](http://forensicswiki.org/wiki/Gzip), the values 0 to 7 for the third byte (Compression method) are reserved and only the value 8 is actually valid. Is this link not showing current information? – Daniel Hilgarth Jul 28 '15 at 16:10
  • It seems like I should accept this answer but I prefer to wait to see if no one provides more corrections – Matías Fidemraizer Jul 28 '15 at 16:15
  • 1
    @DanielHilgarth the GNU GZip utility allows for the other compression schemes I mention, and uses the flag as I describe, though I don't know of any standards document outdating RFC 1952 which, as you say, lists them as *reserved*. Still, those values are found if you use GNU GZip or anything compatible with it. – Jon Hanna Jul 28 '15 at 16:15
  • No, gzip neither accepts nor generates any values in the third byte but 8. gzip will decompress Unix compress and Unix pack files which have a different _first two_ bytes, `0x1f 0x9d` for compress and `0x1f 0x1e` for pack. When you say "lzf" I think you mean lzh. gzip will also decompress that, marked with `0x1f 0xa0` in the first two bytes. – Mark Adler Jul 28 '15 at 22:13
3

It should be reliable, because this header is from the GZip specification and therefore not .NET specific. See here for an explanation of these values.

However, according to the specification, only the two first bytes are actually always the same. The third byte is practically always the same, because currently only one valid value exists. The following bytes might change.

Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
2

A gzip stream is assured to start with 0x1f 0x8b 0x08. There is no other compression method supported than the 0x08 in the third byte.

So if you don't see 0x1f 0x8b 0x08, then it's not a gzip stream. However if you do see 0x1f 0x8b 0x08, then it may or may not be a gzip stream. It probably is, but you can't assume that.

What you should do with a candidate gzip file is to simply start decompressing it as such. The decoder will immediately recognize if there is no gzip header, and will furthermore soon detect a problem in the compressed data if there is an accidental gzip header. You shouldn't have to check for the header, since the decoder already does, as well as check for valid compressed data after that.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • While you're right, checking the header avoids a predictable exception (so it's not that exceptional after all...). So going with your approach means a try/catch and mute the exception if the problem is that the data isn't valid... – Matías Fidemraizer Jul 29 '15 at 07:31
  • What's so terrible about an exception? Using that approach assures that it is kept up to date with any additions or changes to `GZipStream`. – Mark Adler Jul 29 '15 at 14:54
  • OHHHH NOOO!! The endless discussion! :D We can go crazy this way. There're many points of view about the thing about when to throw and handle exceptions. – Matías Fidemraizer Jul 29 '15 at 14:56