I have created a decoder to essentially parse, decompress and extract a single file from a zlib encoded file downloaded through a urllib2 file-like object. The idea is to utilize as little memory and disk space as possible, so I am using a reader / writer pattern with the "decoder" in the middle to uncompress the data coming from urllib2, feed it into a cpio subprocess and finally write the file data to disk:
with closing(builder.open()) as reader:
with open(component, "w+b") as writer:
decoder = Decoder()
while True:
data = reader.read(10240)
if len(data) == 0:
break
writer.write(decoder.decode(data))
final = decoder.flush()
if final is not None:
writer.write(final)
writer.flush()
The decoder is pretty simple too:
class Decoder(object):
def __init__(self):
self.__zcat = zlib.decompressobj()
# cpio initialisation
def decode(self, data_in):
return self.__consume(self.__zcat.decompress(data_in))
def __consume(self, zcat_data_in):
# cpio operations
return data_out
def flush(self):
return self.__consume(self.__zcat.flush())
I am seeing an error before anything is even passed to the cpio pipe, so I felt omitting it here was sensible for clarity.
The interesting thing, is that to verify the data could in fact be uncompressed by zlib, I wrote the raw data data_in
being passed to decode()
to stdout:
def decode(self, data_in):
sys.stdout.write(data_in)
return self.__consume(self.__zcat.decompress(data_in))
Then ran:
$ bin/myprog.py 2>/dev/null | zcat - | file -
/dev/stdin: cpio archive
As you can see, zcat was quite happy about the data it was given on stdin and the resultant file is a cpio archive. But the zlib decompress method is reporting:
error: Error -3 while decompressing: incorrect header check