2

I'm using the python gzip library to expand files, and a few of them are corrupted. The exact error is this:

Error -3 while decompressing: invalid block type

Is it possible to either read all the data before the broken point of the file, or to somehow skip over the broken point and read what's before and after? The compressed files are basically lines of text, and I would like to recover as much data as possible.

Thanks

vgoklani
  • 10,685
  • 16
  • 63
  • 101
  • 1
    You can try to use `os.system('gunzip < corrupted.gz > out')`, it should extract data before bad sector. But extract data after bad sector - I don't now any reliable approach. – serkos Nov 07 '14 at 05:24
  • 1
    The author of `gzip` [has this to say](http://www.gzip.org/recover.txt). Recovering data before the damage is done as @serkos suggests above. Recovering data after looks hard, and the suggested method involves editing the `c` source of gzip, which is not the Python solution you are looking for. – Tony Nov 07 '14 at 07:00

2 Answers2

4

Hopefully someone finds this useful:

# http://stackoverflow.com/questions/2423866/python-decompressing-gzip-chunk-by-chunk
# http://stackoverflow.com/questions/3122145/zlib-error-error-3-while-decompressing-incorrect-header-check/22310760
def read_corrupted_file(filename, CHUNKSIZE=1024):
    d = zlib.decompressobj(zlib.MAX_WBITS | 32)
    with open(filename, 'rb') as f:
        result_str = ''
        buffer=f.read(CHUNKSIZE)
        try:
            while buffer:
                result_str += d.decompress(buffer)
                buffer=f.read(CHUNKSIZE)
        except Exception as e:
            print 'Error: %s -> %s' % (filename, e.message)
        return result_str
vgoklani
  • 10,685
  • 16
  • 63
  • 101
2

You can use the Python zlib interface to decompress a piece at a time, which will give you the decompressed data up to the bad block. Note that the corruption likely precedes the point where it is caught, so some amount at the end of the decompressed data you get may be corrupted.

Recovering data after the error is pretty much impossible (see link in comment to question), unless the gzip file was specially prepared to have recovery points. The gzip utility itself doesn't do that.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158