24

I have a byte array containing data that is compressed by gzip. Now I need to uncompress this data. How can this be achieved?

bad_coder
  • 11,289
  • 20
  • 44
  • 72
Sylar
  • 2,273
  • 2
  • 18
  • 26
  • 1
    You are (a) running (b) reading the docs of which version(s) of Python? – John Machin May 25 '11 at 11:58
  • Hello, we're using Python 2.2.1. – Sylar May 25 '11 at 12:47
  • 1
    Hello, if you are using Python 2.2.1 then you don't have a `bytearray`. You must have a `str` object, or maybe an `array.array('b')`. To confirm, `print type(the_thing)` and edit your question to show the result. – John Machin May 25 '11 at 13:04

2 Answers2

39

zlib.decompress(data, 15 + 32) should autodetect whether you have gzip data or zlib data.

zlib.decompress(data, 15 + 16) should work if gzip and barf if zlib.

Here it is with Python 2.7.1, creating a little gz file, reading it back, and decompressing it:

>>> import gzip, zlib
>>> f = gzip.open('foo.gz', 'wb')
>>> f.write(b"hello world")
11
>>> f.close()
>>> c = open('foo.gz', 'rb').read()
>>> c
'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00'
>>> ba = bytearray(c)
>>> ba
bytearray(b'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00')
>>> zlib.decompress(ba, 15+32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: must be string or read-only buffer, not bytearray
>>> zlib.decompress(bytes(ba), 15+32)
'hello world'
>>>

Python 3.x usage would be very similar.

Update based on comment that you are running Python 2.2.1.

Sigh. That's not even the last release of Python 2.2. Anyway, continuing with the foo.gz file created as above:

Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> strobj = open('foo.gz', 'rb').read()
>>> strobj
'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00'
>>> import zlib
>>> zlib.decompress(strobj, 15+32)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
zlib.error: Error -2 while preparing to decompress data
>>> zlib.decompress(strobj, 15+16)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
zlib.error: Error -2 while preparing to decompress data

# OK, we can't use the back door method. Plan B: use the 
# documented approach i.e. gzip.GzipFile with a file-like object.

>>> import gzip, cStringIO
>>> fileobj = cStringIO.StringIO(strobj)
>>> gzf = gzip.GzipFile('dummy-name', 'rb', 9, fileobj)
>>> gzf.read()
'hello world'

# Success. Now let's assume you have an array.array object-- which requires
# premeditation; they aren't created accidentally!
# The following code assumes subtype 'B' but should work for any subtype.

>>> import array, sys
>>> aaB = array.array('B')
>>> aaB.fromfile(open('foo.gz', 'rb'), sys.maxint)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
EOFError: not enough items in file
#### Don't panic, just read the fine manual
>>> aaB
array('B', [31, 139, 8, 8, 20, 244, 220, 77, 2, 255, 102, 111, 111, 0, 203, 72, 205, 201, 201, 87, 40, 207, 47, 202, 73, 1, 0, 133, 17, 74, 13, 11, 0, 0, 0])
>>> strobj2 = aaB.tostring()
>>> strobj2 == strobj
1 #### means True 
# You can make a str object and use that as above.

# ... or you can plug it directly into StringIO:
>>> gzip.GzipFile('dummy-name', 'rb', 9, cStringIO.StringIO(aaB)).read()
'hello world'
John Machin
  • 81,303
  • 11
  • 141
  • 189
6

Apparently you can do this

import zlib
# ...
ungziped_str = zlib.decompressobj().decompress('x\x9c' + gziped_str)

Or this:

zlib.decompress( data ) # equivalent to gzdecompress()

For more info, look here: Python docs

evgeny
  • 2,564
  • 17
  • 27
  • From what I've read these operate on strings, but the data i need to decompress is a byte array. Or can zlib.decompress be used on byte arrays? – Sylar May 25 '11 at 11:28
  • 1
    You can try converting the byte array to a string. Something like `reduce(lambda x, y: return str(x) + str(y), bytearray` could do - but i'm not a python programmer, so i wouldn't know – evgeny May 25 '11 at 11:37
  • 1
    oh, i think it's `bytes(bytearray)`. Try that? – evgeny May 25 '11 at 12:54
  • -1 @evgeny and 4 gadarene upvoters: Neither of your zlib-related suggestions work. `gzip` != `zlib`. A gzip stream INCLUDES a zlib stream. To decompress a gzip stream, either use the gzip module, or use the zlib module with arcane arguments derived from much googling or reading the C-library docs at zlib.net. – John Machin May 25 '11 at 22:41
  • 2
    gzip module only supports stream - sometimes the data doesn't come as a stream - and it seems wrong to push it into being stream just to decode it. – Maria Zverina Feb 02 '12 at 15:54
  • Uncaught exception: : module 'zlib' has no attribute 'decrompress' – Martin Thoma Sep 10 '19 at 12:15