6

Some HTTP servers send deflate raw body (without zlib headers) instead of actual deflate body. See discussion at: Why do real-world servers prefer gzip over deflate encoding?

Is it possible to detect them and handle inflate properly in Node.js? I mean besides try to createInflate them and catch error then try createInflateRaw again.

Community
  • 1
  • 1
bitinn
  • 9,188
  • 10
  • 38
  • 64

2 Answers2

16

If the first byte in hex has a low nybble of 8, then it is a zlib stream. Otherwise it is a raw deflate stream. (Assuming that you know a priori that the only possible choices are a valid zlib stream or a valid deflate stream.) A raw deflate stream will never have an 8 in the low first nybble, but a zlib stream always will.

Background:

The zlib header format puts the compression method in the low nybble of the first byte. That compression method is always 8 for deflate.

The bit sequence in a raw deflate stream starts from the least significant bits of the bytes. If the first three bits are 000 (as they are for an 8), that signifies a stored (not compressed block), and it is not the last block. Stored blocks put the bytes of the input on byte boundaries. So the next thing that is done by the compressor after writing the 000 bits is to fill out the rest of the byte with zero bits to get to the next byte boundary. Therefore the next bit will never be a 1, so it is not possible for a valid deflate stream to have the first four bits be 1000, or the first nybble to be 8. (Note that the bits are read from the bottom up.)

The first (i.e. low) nybble of a valid deflate stream can only be 0..5 or a..d. If you see 6..9, e, or f, then it is not a valid deflate stream.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • RFC 1951 says that "Any bits of input up to the next byte boundary are ignored" for stored blocks (BTYPE=00). This means 0x78 could be the start of a valid raw deflate stream. For example, `require('node:zlib').inflateRawSync(Buffer.from([0x78, 1, 0, 0xfe, 0xff, 0x13, 0x79, 1, 0, 0xfe, 0xff, 0x37]))` returns ``. – Victor Jun 29 '23 at 02:33
  • @Victor You are entirely correct. However I assert that no deflate compressor will produce that. The skipped bits are always filled with zeros. – Mark Adler Jun 29 '23 at 03:49
0

Theoretically, it is impossible to distinguish between the two. In the example below, buf is both a valid raw deflate stream and a valid zlib stream.

const zlib = require('node:zlib')

const buf = Buffer.from([
    0x08, 0x1d, 0x79, 0xe2, 0x86, 0x1d, 0x79,
    ...Array(31003).fill(0),
    0x09, 0xc6, 0x0d, 0x39, 0xf2,
    ...Array(3522).fill(0),
    0x71, 0xa4, 0x02, 0x08,
])

console.log(zlib.inflateRawSync(buf))
// <Buffer 1d 79 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 34481 more bytes>

console.log(zlib.inflateSync(buf))
// <Buffer 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 34480 more bytes>

Practically, most compressor implementations would insert zero bits when aligning to a byte boundary, so it is possible to distinguish by checking the lower 4 bits of the first byte, as described in the other answer.

Victor
  • 743
  • 1
  • 5
  • 16