1

I am trying to unzip a gzipped file in Python using the gzip module. The pre-condition is that, I get 160 bytesof data at a time, and I need to unzip it before I request for the next 160 bytes. Partial unzipping is OK, before requesting the next 160 bytes. The code I have is

import gzip
import time
import StringIO

file = open('input_cp.gz', 'rb')
buf = file.read(160)
sio = StringIO.StringIO(buf)
f = gzip.GzipFile(fileobj=sio)
data = f.read()
print data

The error I am getting is IOError: CRC check failed. I am assuming this is cuz it expects the entire gzipped content to be present in buf, whereas I am reading in only 160 bytes at a time. Is there a workaround this??

Thanks

Ayman Hourieh
  • 132,184
  • 23
  • 144
  • 116
user210126
  • 103
  • 1
  • 1
  • 5
  • Related question: http://stackoverflow.com/questions/339053/how-do-you-unzip-very-large-files-in-python – jfs Nov 14 '09 at 00:25

1 Answers1

4

Create your own class with a read() method (and whatever else GzipFile needs from fileobj, like close and seek) and pass it to GzipFile. Something like:

class MyBuffer(object):
  def __init__(self, input_file):
    self.input_file = input_file

  def read(self, size=-1):
    if size < 0:
      size = 160
    return self.input_file.read(min(160, size))

Then use it like:

file = open('input_cp.gz', 'rb')
mybuf = MyBuffer(file)
f = gzip.GzipFile(fileobj=mybuf)
data = f.read()
fserb
  • 4,004
  • 2
  • 26
  • 23
  • No errors this time, but a blank line was o/p to console. Pretty sure 160b are enough to unzip. Also tried w/ 2000b class MyBuffer(object): def __init__(self, input_file): self.input_file = input_file def read(self, size=-1): if size<0: size = 160 return self.input_file.read(min(160,size)) def tell(self): return def seek(self, start, end): return def close(self): return file = open('input_cp.gz', 'rb') mybuf = MyBuffer(file) f = gzip.GzipFile(fileobj=mybuf) data = f.read() print data – user210126 Nov 13 '09 at 02:42