0

I'm stuck using Jython 2.2.1 with zlib module version 1.1.3 for a project. I need to download a lot of gzipped data, process it, and write it to a database. I'd like to avoid having multiple copies of the data in memory, so I'm decompressing it as a stream.

Using Python 2.7.2, I have been able to decompress a gzip stream as:

from zlib import decompressobj, MAX_WBITS

f = open('stream.gz', 'rb') # in real life, this stream comes from urllib2  
gunzipper = decompressobj(16+MAX_WBITS)  
data = ''  
for chunk in iter(lambda: f.read(BLOCK_SIZE), ''):
    data += gunzipper.decompress(chunk)
#done

Under Jython 2.2.1, however, the same code gets an error when creating the decompressobj:

.\jythonLib.jar\lib/zlib$py.class", line 89, in __init__
ValueError: Invalid initialization option

Apparently the header offset trick doesn't work with this old version of zlib.

I'm new to the Java side of Jython, and was wondering if there is a way to decompress a gzip stream using Java classes within Jython? Or perhaps there is a way to coax zlib 1.1.3 into accepting the gzip header?

Any other potential solutions are welcome.

Community
  • 1
  • 1
Robbie Rosati
  • 1,205
  • 1
  • 9
  • 23

2 Answers2

0

There is not a way to coax those calls to zlib 1.1.3 into decoding the gzip header. That capability was added in zlib 1.2.0.

You could alternatively decode the gzip wrapper yourself, and invoke raw inflate with -MAX_WBITS as the parameter for the compressed payload. You can find the gzip wrapper defined in RFC 1952.

Community
  • 1
  • 1
Mark Adler
  • 101,978
  • 13
  • 118
  • 158
0

I was able to work around using this old zlib module by using the integrated Java libraries within Jython.

I also had to handle my URL using Java classes, in order to pass a FileInputStream object to the gzip decoder.

For future reference:

from java.io import BufferedReader,InputStreamReader
from java.util.zip import GZIPInputStream
from java.net import URL

url = URL('http://data.com')
urlconn = url.openConnection()
urlconn.setRequestProperty('Accept-encoding', 'gzip, compress')
urlconn.connect()

reader = BufferedReader(InputStreamReader(GZIPInputStream(urlconn.getInputStream())))
data = ''
for chunk in iter(lamdba: reader.readLine(), None):
    data += chunk
Robbie Rosati
  • 1,205
  • 1
  • 9
  • 23