1

Is there a way to discover the total uncompressed size of all files inside a tar.gz without iterating through all the TarArchiveEntries in a TarArchiveInputStream like below?

TarArchiveInputStream tin = new TarArchiveInputStream(new GZIPInputStream(new FileInputStream("/path/to/my.tar.gz")));
TarArchiveEntry ten;
long size = 0;
while( (ten = tin.getNextTarEntry()) != null) {
    size += ten.getSize();
}
Upio
  • 1,364
  • 1
  • 12
  • 27
  • A tar doesn't zip the files, so I suppose if you only read the zipped content size you will get the right information... or pretty much close (probably the tar will include file path/name in its size). – Alexandre Lavoie Dec 04 '14 at 22:19
  • I don't think so. `.tar` files don't have any kind of header that describes the entire archive. It has a header for each file, followed by the contents of that file; to get the info for the second file, you have to skip over the contents of the first file, and so on, so basically you have to look at the header of every file. There's no shortcut. If you look at the size of the decompressed `.tar`, it will include the sizes of all the headers plus filler bytes between each file for alignment, which probably isn't what you want. – ajb Dec 04 '14 at 22:26
  • I don't need the exact size, just close enough. It's for a tar.gz input format in hadoop and I want to report progress through the file as total bytes read out of total uncompressed bytes of all files. Does tar do any compression on its own? Because if not maybe I can look at the GZIP header? – Upio Dec 04 '14 at 22:30
  • Nevermind, it looks like it is impossible. You can get the uncompressed size of a gzip file by reading the last 4 bytes http://stackoverflow.com/questions/7317243/gets-the-uncompressed-size-of-this-gzipinputstream – Upio Dec 04 '14 at 22:38

0 Answers0