Is GZipStream header reliable across .NET versions?

Question

I came to the Q&A Is there a way to know if the byte[] has been compressed by gzipstream? and some author states (and it's true) that GZipStream puts {0x1f, 0x8b, 8, 0, 0, 0, 0, 0, 4, 0} characters as header to know if a byte array is a compressed string.

And my question is, is GZipStream header reliable across .NET versions?

Jon Hanna · Accepted Answer · 2015-12-09T11:00:04.620

5

With any GZip format stream you are guarnateed:

First two bytes: 1f, 8b

Next byte: 00 for store (no compression), 01 for compress algorithm, 02 for pack, 03 for lzf and 08 for deflate. .NET so-far always uses deflate and many situations expect only deflate (only deflate-based gzip is expected by web clients as a transfer or content encoding marked as gzip) so it would be unlikely to change without some sort of option to specify it being added.

The next is the file type, with 00 meaning "probably some sort of text file" Since GZipStream has no information on the file type, it always uses that.

The next four are file-modification time in Unix format. Again, since the class has no information about the file–as it receives a stream, not a file with metadata, these are always set to 0.

The next byte depends on the compression method. With deflate it could be 2 to indicate heavy compression or 4 to indicate light compression.

The next (last in your sequence) depends on the OS type in use. 0 means "FAT Filesystem" but has continued to be used by Windows as Windows has moved to use other file systems like NTFS. It could potentially have a different value if used with Mono on a non-Windows file system, though that situation could also potentially decide to match the .NET behaviour. (Update: At least some versions of Mono will set the file-system flag to something other than 0 on non-Windows systems).

edited Dec 09 '15 at 11:00

answered Jul 28 '15 at 16:09

Jon Hanna

110,372
10
146
251

According to [this link](http://forensicswiki.org/wiki/Gzip), the values 0 to 7 for the third byte (Compression method) are reserved and only the value 8 is actually valid. Is this link not showing current information? – Daniel Hilgarth Jul 28 '15 at 16:10
It seems like I should accept this answer but I prefer to wait to see if no one provides more corrections – Matías Fidemraizer Jul 28 '15 at 16:15
1

@DanielHilgarth the GNU GZip utility allows for the other compression schemes I mention, and uses the flag as I describe, though I don't know of any standards document outdating RFC 1952 which, as you say, lists them as *reserved*. Still, those values are found if you use GNU GZip or anything compatible with it. – Jon Hanna Jul 28 '15 at 16:15
No, gzip neither accepts nor generates any values in the third byte but 8. gzip will decompress Unix compress and Unix pack files which have a different _first two_ bytes, `0x1f 0x9d` for compress and `0x1f 0x1e` for pack. When you say "lzf" I think you mean lzh. gzip will also decompress that, marked with `0x1f 0xa0` in the first two bytes. – Mark Adler Jul 28 '15 at 22:13

score 3 · Answer 2 · answered Jul 28 '15 at 16:00

3

It should be reliable, because this header is from the GZip specification and therefore not .NET specific. See here for an explanation of these values.

However, according to the specification, only the two first bytes are actually always the same. The third byte is practically always the same, because currently only one valid value exists. The following bytes might change.

answered Jul 28 '15 at 16:00

Daniel Hilgarth

171,043
40
335
443

It might change if I change how I parametrize `GZipStream`, am I mistaken? – Matías Fidemraizer Jul 28 '15 at 16:07
Sorry, it might change if I use other class or lib to compress using gzip.... right? – Matías Fidemraizer Jul 28 '15 at 16:08
Not the first three bytes. But the rest might change, yes – Daniel Hilgarth Jul 28 '15 at 16:09
Thanks for your effort. Probably I'll accept the other answer because it provides more background, but it doesn't mean that you're answer isn't correct. It's fine too – Matías Fidemraizer Jul 28 '15 at 16:15
@MatíasFidemraizer: Sure, go ahead. I would do the same :) – Daniel Hilgarth Jul 28 '15 at 16:16
@MatíasFidemraizer: the accepted answer is incorrect as currently written. It needs to at least be edited. The third byte can only be 8, and gzip neither generates nor accepts anything else. This answer is correct. – Mark Adler Jul 28 '15 at 22:34

Mark Adler · Answer 3 · 2015-07-28T22:31:54.963

2

A gzip stream is assured to start with 0x1f 0x8b 0x08. There is no other compression method supported than the 0x08 in the third byte.

So if you don't see 0x1f 0x8b 0x08, then it's not a gzip stream. However if you do see 0x1f 0x8b 0x08, then it may or may not be a gzip stream. It probably is, but you can't assume that.

What you should do with a candidate gzip file is to simply start decompressing it as such. The decoder will immediately recognize if there is no gzip header, and will furthermore soon detect a problem in the compressed data if there is an accidental gzip header. You shouldn't have to check for the header, since the decoder already does, as well as check for valid compressed data after that.

edited Jul 28 '15 at 22:31

answered Jul 28 '15 at 22:19

Mark Adler

101,978
13
118
158

While you're right, checking the header avoids a predictable exception (so it's not that exceptional after all...). So going with your approach means a try/catch and mute the exception if the problem is that the data isn't valid... – Matías Fidemraizer Jul 29 '15 at 07:31
What's so terrible about an exception? Using that approach assures that it is kept up to date with any additions or changes to `GZipStream`. – Mark Adler Jul 29 '15 at 14:54
OHHHH NOOO!! The endless discussion! :D We can go crazy this way. There're many points of view about the thing about when to throw and handle exceptions. – Matías Fidemraizer Jul 29 '15 at 14:56

Is GZipStream header reliable across .NET versions?

3 Answers3