55

Can anyone show me the correct way to compress and decompress tar.gzip files in java i've been searching but the most i can find is either zip or gzip(alone).

kdgwill
  • 2,129
  • 4
  • 29
  • 46
  • 4
    tgz files aren't anything special -- you un-gzip it first, then un-tar it. – Chris Eberle Aug 19 '11 at 22:47
  • related: [How to print the content of a tar.gz file with Java?](http://stackoverflow.com/questions/5094074/how-to-print-the-content-of-a-tar-gz-file-with-java) – David Cary Aug 24 '11 at 15:50
  • See also http://stackoverflow.com/questions/315618/how-do-i-extract-a-tar-file-in-java – Vadzim Oct 28 '16 at 16:19

6 Answers6

43

I've written a wrapper for commons-compress called jarchivelib that makes it easy to extract or compress from and into File objects.

Example code would look like this:

File archive = new File("/home/thrau/archive.tar.gz");
File destination = new File("/home/thrau/archive/");

Archiver archiver = ArchiverFactory.createArchiver("tar", "gz");
archiver.extract(archive, destination);
thrau
  • 2,915
  • 3
  • 25
  • 32
33

My favorite is plexus-archiver - see sources on GitHub.

Another option is Apache commons-compress - (see mvnrepository).

With plexus-utils, the code for unarchiving looks like this:

final TarGZipUnArchiver ua = new TarGZipUnArchiver();
// Logging - as @Akom noted, logging is mandatory in newer versions, so you can use a code like this to configure it:
ConsoleLoggerManager manager = new ConsoleLoggerManager();
manager.initialize();
ua.enableLogging(manager.getLoggerForComponent("bla"));
// -- end of logging part
ua.setSourceFile(sourceFile);
destDir.mkdirs();
ua.setDestDirectory(destDir);
ua.extract();

Similar *Archiver classes are there for archiving.

With Maven, you can use this dependency:

<dependency>
  <groupId>org.codehaus.plexus</groupId>
  <artifactId>plexus-archiver</artifactId>
  <version>2.2</version>
</dependency>
PHPirate
  • 7,023
  • 7
  • 48
  • 84
Petr Kozelka
  • 7,670
  • 2
  • 29
  • 44
18

To extract the contents of .tar.gz format, I successfully use apache commons-compress ('org.apache.commons:commons-compress:1.12'). Take a look at this example method:

public void extractTarGZ(InputStream in) {
    GzipCompressorInputStream gzipIn = new GzipCompressorInputStream(in);
    try (TarArchiveInputStream tarIn = new TarArchiveInputStream(gzipIn)) {
        TarArchiveEntry entry;

        while ((entry = (TarArchiveEntry) tarIn.getNextEntry()) != null) {
            /** If the entry is a directory, create the directory. **/
            if (entry.isDirectory()) {
                File f = new File(entry.getName());
                boolean created = f.mkdir();
                if (!created) {
                    System.out.printf("Unable to create directory '%s', during extraction of archive contents.\n",
                            f.getAbsolutePath());
                }
            } else {
                int count;
                byte data[] = new byte[BUFFER_SIZE];
                FileOutputStream fos = new FileOutputStream(entry.getName(), false);
                try (BufferedOutputStream dest = new BufferedOutputStream(fos, BUFFER_SIZE)) {
                    while ((count = tarIn.read(data, 0, BUFFER_SIZE)) != -1) {
                        dest.write(data, 0, count);
                    }
                }
            }
        }

        System.out.println("Untar completed successfully!");
    }
}
RemusS
  • 1,395
  • 1
  • 11
  • 9
  • 1
    Since you are using the try-with-resources syntax, you shouldn't need `dest.close();` and `tarIn.close();` – FGreg Mar 29 '17 at 20:57
  • 2
    Warning: This is unsafe due to ZipSlip, do not use this code in production software. Specifically f.mkdir() is not safe to call in the blind: https://snyk.io/research/zip-slip-vulnerability – sichinumi Mar 30 '22 at 22:23
  • 1
    Useful to additionally note that the ZipSlip vulnerability mentioned by sichinumi can be avoided by skipping (or rewriting) any file names (or the whole archive) which contain directory traversal portions "../" or "./" as they are usually a pretty clear-cut sign of something malicious. – parabolah Nov 05 '22 at 20:49
7

In my experience Apache Compress is much more mature than Plexus Archiver, specifically because of issues like http://jira.codehaus.org/browse/PLXCOMP-131.

I believe Apache Compress has more activity as well.

Gili
  • 86,244
  • 97
  • 390
  • 689
  • Apache Compress cannot extract some tar.gz archives because of a lack of support. This bug has never been resolved : https://www.jfrog.com/jira/browse/HAP-651 – didil Oct 28 '16 at 13:42
  • 4
    @didile how do you expect this to get fixed if the bug was reported to jfrog instead of apache compress? – Gili Oct 28 '16 at 13:47
  • It has been also reported to apache issue tracker. – didil Oct 28 '16 at 13:58
  • 4
    @didile please provide a link. – Gili Oct 28 '16 at 14:19
  • 2
    @didile I don't see any bug reported to https://issues.apache.org/jira/browse/COMPRESS that would match HAP-651. It would be great if you could open one and attach a tar where Compress fails. – Stefan Bodewig Nov 22 '16 at 19:45
2

If you are planning to compress/decompress on Linux, you can call the shell command line to do that for you:

Files.createDirectories(Paths.get(target));
ProcessBuilder builder = new ProcessBuilder();
builder.command("sh", "-c", String.format("tar xfz %s -C %s", tarGzPathLocation, target));
builder.directory(new File("/tmp"));
Process process = builder.start();
int exitCode = process.waitFor();
assert exitCode == 0;
Dherik
  • 17,757
  • 11
  • 115
  • 164
0

With TrueVFS extracting Tar.GZip archive is one-liner:

new TFile("archive.tar.gz").cp_rp(new File("dest/folder"));

But beware of dependencies issue.

Vadzim
  • 24,954
  • 11
  • 143
  • 151