How do you manage chunked data with gzip encoding? I have a server which sends data in the following manner:
HTTP/1.1 200 OK\r\n
...
Transfer-Encoding: chunked\r\n
Content-Encoding: gzip\r\n
\r\n
1f50\r\n\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xec}\xebr\xdb\xb8\xd2\xe0\xef\xb8\xea\xbc\x03\xa2\xcc\x17\xd9\xc7\xba\xfa\x1e\xc9r*\x93\xcbL\xf6\xcc\x9c\xcc7\xf1\x9c\xf9\xb6r\xb2.H ... L\x9aFs\xe7d\xe3\xff\x01\x00\x00\xff\xff\x03\x00H\x9c\xf6\xe93\x00\x01\x00\r\n0\r\n\r\n
I've had a few different approaches to this but there's something i'm forgetting here.
data = b''
depleted = False
while not depleted:
depleted = True
for fd, event in poller.poll(2.0):
depleted = False
if event == select.EPOLLIN:
tmp = sock.recv(8192)
data += zlib.decompress(tmp, 15 + 32)
Gives (also tried decoding only data after \r\n\r\n
obv):
zlib.error: Error -3 while decompressing data: incorrect header check
So I figured the data should be decompressed once the data has been recieved in it's whole format..
...
if event == select.EPOLLIN:
data += sock.recv(8192)
data = zlib.decompress(data.split(b'\r\n\r\n',1)[1], 15 + 32)
Same error. Also tried decompressing data[:-7]
because of the chunk ID at the very end of the data and with data[2:-7]
and other various combinations, but with the same error.
I've also tried the gzip
module via:
with gzip.GzipFile(fileobj=Bytes(data), 'rb') as fh:
fh.read()
But that gives me "Not a gzipped file".
Even after recording down the data as recieved by the servers (headers + data) down into a file, and then creating a server-socket on port 80 serving the data (again, as is) to the browser it renders perfectly so the data is intact.
I took this data, stripped off the headers (and nothing else) and tried gzip on the file:
Thanks to @mark-adler I produced the following code to un-chunk the chunked data:
unchunked = b''
pos = 0
while pos <= len(data):
chunkLen = int(binascii.hexlify(data[pos:pos+2]), 16)
unchunked += data[pos+2:pos+2+chunkLen]
pos += 2+len('\r\n')+chunkLen
with gzip.GzipFile(fileobj=BytesIO(data[:-7])) as fh:
data = fh.read()
This produces OSError: CRC check failed 0x70a18ee9 != 0x5666e236
which is one step closer. In short I clip the data according to these four parts:
<chunk length o' X bytes>
\r\n
<chunk>
\r\n
I'm probably getting there, but not close enough.
Footnote: Yes, the socket is far from optimal, but it looks this way because i thought i didn't get all the data from the socket so i implemented a huge timeout and a attempt at a fail-safe with depleted
:)