1

Say I want to compress the following JSON: {"test": 1, "test2": 2}

I do that on the client side using Pako JS lib:

const test_json = JSON.stringify(test)
const gz_str = pako.gzip(test_json, { to: 'string' })
// Returns string ����������«V*I-.Q²2Ô�3��¬�j�¨äK�����

Decompressing with Pako works just fine

const result = pako.ungzip(gz_str, { to: 'string' })
// Returns '{"test": 1, "test2": 2}'

Now if I try to decompress on server side with Python zlib:

import zlib
gzipped_string = '����������«V*I-.Q²2Ô�3��¬�j�¨äK�����'
s = zlib.decompress(gzipped_string.encode('utf-8'), 31)

I get zlib.error: Error -3 while decompressing data: incorrect header check. I get the same error if I try to ask zlib for automatic header detection with zlib.MAX_WBITS|32

I've found many similar issues (like Compressed with pako(zlib in javascript), decompressing with zlib(python) not working or zlib.error: Error -3 while decompressing: incorrect header check), but were due to either to some encoding/decoding issue, or wrong windowBits option in zlib decompress method.

Some solutions were based on base64, but I want to keep the data as a String, not raw bytes. What am I missing ?

Beinje
  • 572
  • 3
  • 18
  • Compressed data isn't valid unicode, so should never be stored as a string – mousetail Aug 23 '22 at 10:43
  • @mousetail Oh I see, so you're telling me Pako does not return a string but something else ? I guess I must be missing something from the doc then… – Beinje Aug 23 '22 at 12:47
  • Actually, it looks like the output of pako.gzip() is a binary string, not a string. – Beinje Aug 23 '22 at 13:23
  • Are you actually trying to copy and paste the displayed strings with the inverted question marks in hexagons?! Those characters represent binary data that could not be displayed, and the original information in those bytes is completely lost. – Mark Adler Aug 23 '22 at 14:46
  • No, but for the matter of the question, I did not know how to display the content of the compressed string, which is indeed binary data. My confusion comes from the fact that I thought I could use the result of pako compression 'as-is' (i.e. as a String) to send to the backend. – Beinje Aug 23 '22 at 15:13
  • A binary string will display just fine in Python. E.g. `b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xcbH\xcd\xc9\xc9\xd7Q\xc8\x00Q\x8a\x0c\x00\x9d?l\xb5\x0e\x00\x00\x00'` is a short gzip file. – Mark Adler Aug 23 '22 at 19:33
  • How are you reading the compressed data into Python? You need to read it in as a binary string, _not_ a character string. If it's from a file, you would open it with `"rb"`, not `"r"`. – Mark Adler Aug 23 '22 at 19:36
  • I send the binary string through the body of an http request to my python API. So I read the content directly from a variable, not a file. But I did not manage to decompress the binary string. So far, the only way I found to get it work is to do an additional base64 encoding on the client side, then first decode the base64 compressed string and finally decompress it. That feels a bit overkilling behaviour to me but I'm open to any better suggestion. – Beinje Aug 24 '22 at 11:16

0 Answers0