1

I can check the integrity of a gzip file with gzip -t file.gz and zcat file.gz > /dev/null as per previous answers.

Sometimes I have jobs dying before a compression of a large file finishes. I will get an error about unexpected end of file, if I check the file from beginning to end. But is it possible to only test, that there is no unexpected end of the compressed file, so I don't have to read through the entire file?

EDIT 2018 in accordance with answer from Mark Adler below (Python 3.2+ solution):

import os
import string
import gzip

with gzip.open('test.gz', 'wt') as f:
    f.write(string.ascii_lowercase)

with open('test.gz', 'rb') as f:
    f.seek(-4 , os.SEEK_END)
    length = int.from_bytes(f.read(), byteorder='little')
    assert length == 26
    print('Thanks Mark Adler!') 
    print('The English alphabet has {length} letters.'.format(length=length))
tommy.carstensen
  • 8,962
  • 15
  • 65
  • 108

1 Answers1

2

No, there is not. You would need to decompress all the way through to see if deflate compressed data terminates properly, and that it is followed by a 32-bit CRC and the uncompressed data length modulo 232.

If you happen to know the length of the uncompressed data, or know some constraints on the length, then you can check the last four bytes of the gzip file to see if it matches or meets the constraint. If it does not agree, then you know that the gzip file didn't finish. If it does agree, then you can only only conclude that it is probably ok. (There is some possibility that the stream happened to terminate early with the last four bytes meeting the constraint by accident.)

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Just returning to this a few years later. If I *do* know the length of the uncompressed data, what is the last four bytes then supposed to be? Thanks for pointing out, that there could be false positives. – tommy.carstensen Dec 04 '18 at 13:35
  • 1
    The last four bytes are the uncompressed length, modulo 2^32, in little-endian order. – Mark Adler Dec 04 '18 at 19:46