1

To retrieve the uncompressed size of a file that is compressed via gzip, you can read the last four bytes. I am doing this to see if there are any files that are not the size they are supposed to be. If a file is smaller than it should be, I use this code to append to the file:

GZIPOutputStream gzipoutput = new GZIPOutputStream
    (new FileOutputStream(file, true));

while ((len=bs.read(buf)) >= 0) {
    gzipoutput.write(buf, 0, len);
}

gzipoutput.finish();
gzipoutput.close();

Of course, this appends to the end of the gzip file as expected. However, after the append, reading the last four bytes of the gzip file (to get the uncompressed file size), does not give me expected results. I suspect that it is because using the GZIPOutputStream does not correctly append the size bytes to the end of the file.

How can I modify my code so that the correct size bytes are appended?

EDIT

I am reading the bytes in little-endian order, like so:

gzipReader.seek(gzipReader.length() - 4);
int byteFour = gzipReader.read();
int byteThree = gzipReader.read();
int byteTwo = gzipReader.read();
int byteOne = gzipReader.read();
// Now combine them in little endian
long size = ((long)byteOne << 24) | ((long)byteTwo << 16) | ((long)byteThree << 8) | ((long)byteFour);

I was thinking that since I was appending to a gzip file, it only wrote the bytes appended instead of the total file size. Is that plausible?

user207421
  • 305,947
  • 44
  • 307
  • 483
  • There's certainly code in `GZIPOutputStream` to write the uncompressed file size as the last four bytes. I don't think that's your problem. How are you reading the bytes? Maybe you're reading them in the wrong order. – Mark Peters Aug 21 '14 at 22:55
  • I am reading the bytes in little-endian order, like so: `gzipReader.seek(gzipReader.length() - 4); int byteFour = gzipReader.read(); int byteThree = gzipReader.read(); int byteTwo = gzipReader.read(); int byteOne = gzipReader.read(); // Now combine them in little endian long size = ((long)byteOne << 24) | ((long)byteTwo << 16) | ((long)byteThree << 8) | ((long)byteFour);` I was thinking that since I was appending to a gzip file, it only wrote the bytes appended instead of the total file size. Is that plausible? – brandonio21 Aug 21 '14 at 23:05

1 Answers1

1

since I was appending to a gzip file, it only wrote the bytes appended instead of the total file size. Is that plausible?

Not only plausible but inevitable. Have a look at your code. How exactly is the appending GZIPOutputStream going to know the previous size of the file? All it can see is the incoming data and the outgoing OutputStream.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • +1; it's an assumption of `GZIPOutputStream` that only the data outputted by the stream represents a valid GZIP file format containing the bytes that were written to it. You can't just append to a pre-existing GZIP file and expect the resulting total to be legitimate. As you say, computers aren't magic. – Mark Peters Aug 22 '14 at 01:55
  • @MarkPeters Actually, and to my surprise, the result of the append is a valid GZIP stream apart from the trailing length word: at least, `GZipInputStream` can read it in a single pass. Evidently it can cope with another stream header. – user207421 Aug 22 '14 at 02:00
  • So when I append to the gzip file, I am actually appending to the previous length word as well? I once heard that reading the length word was only reliable if the gzip file was created using a single stream - I guess that this would be the case when the gzip file is created using multiple streams. Thank you! – brandonio21 Aug 22 '14 at 16:55