69

How do I extract a tar (or tar.gz, or tar.bz2) file in Java?

skiphoppy
  • 97,646
  • 72
  • 174
  • 218
  • skiphoppy, after 2008 when I originally answered, the Apache Commons Compress project was released. You should probably accept [this answer](http://stackoverflow.com/a/7556307/3474) so that it's highlighted more. – erickson Oct 29 '15 at 17:58

8 Answers8

77

You can do this with the Apache Commons Compress library. You can download the 1.2 version from http://mvnrepository.com/artifact/org.apache.commons/commons-compress/1.2.

Here are two methods: one that unzips a file and another one that untars it. So, for a file <fileName>tar.gz, you need to first unzip it and after that untar it. Please note that the tar archive may contain folders as well, case in which they need to be created on the local filesystem.

Enjoy.

/** Untar an input file into an output file.

 * The output file is created in the output folder, having the same name
 * as the input file, minus the '.tar' extension. 
 * 
 * @param inputFile     the input .tar file
 * @param outputDir     the output directory file. 
 * @throws IOException 
 * @throws FileNotFoundException
 *  
 * @return  The {@link List} of {@link File}s with the untared content.
 * @throws ArchiveException 
 */
private static List<File> unTar(final File inputFile, final File outputDir) throws FileNotFoundException, IOException, ArchiveException {

    LOG.info(String.format("Untaring %s to dir %s.", inputFile.getAbsolutePath(), outputDir.getAbsolutePath()));

    final List<File> untaredFiles = new LinkedList<File>();
    final InputStream is = new FileInputStream(inputFile); 
    final TarArchiveInputStream debInputStream = (TarArchiveInputStream) new ArchiveStreamFactory().createArchiveInputStream("tar", is);
    TarArchiveEntry entry = null; 
    while ((entry = (TarArchiveEntry)debInputStream.getNextEntry()) != null) {
        final File outputFile = new File(outputDir, entry.getName());
        if (entry.isDirectory()) {
            LOG.info(String.format("Attempting to write output directory %s.", outputFile.getAbsolutePath()));
            if (!outputFile.exists()) {
                LOG.info(String.format("Attempting to create output directory %s.", outputFile.getAbsolutePath()));
                if (!outputFile.mkdirs()) {
                    throw new IllegalStateException(String.format("Couldn't create directory %s.", outputFile.getAbsolutePath()));
                }
            }
        } else {
            LOG.info(String.format("Creating output file %s.", outputFile.getAbsolutePath()));
            final OutputStream outputFileStream = new FileOutputStream(outputFile); 
            IOUtils.copy(debInputStream, outputFileStream);
            outputFileStream.close();
        }
        untaredFiles.add(outputFile);
    }
    debInputStream.close(); 

    return untaredFiles;
}

/**
 * Ungzip an input file into an output file.
 * <p>
 * The output file is created in the output folder, having the same name
 * as the input file, minus the '.gz' extension. 
 * 
 * @param inputFile     the input .gz file
 * @param outputDir     the output directory file. 
 * @throws IOException 
 * @throws FileNotFoundException
 *  
 * @return  The {@File} with the ungzipped content.
 */
private static File unGzip(final File inputFile, final File outputDir) throws FileNotFoundException, IOException {

    LOG.info(String.format("Ungzipping %s to dir %s.", inputFile.getAbsolutePath(), outputDir.getAbsolutePath()));

    final File outputFile = new File(outputDir, inputFile.getName().substring(0, inputFile.getName().length() - 3));

    final GZIPInputStream in = new GZIPInputStream(new FileInputStream(inputFile));
    final FileOutputStream out = new FileOutputStream(outputFile);

    IOUtils.copy(in, out);

    in.close();
    out.close();

    return outputFile;
}
aristotll
  • 8,694
  • 6
  • 33
  • 53
Dan Borza
  • 3,419
  • 3
  • 22
  • 16
  • 1
    Your example is a great start, but I seem to have a problem with: while ((entry = (TarArchiveEntry)debInputStream.getNextEntry()) != null). the problem is when I process the first file through external framewokr (e.g. SAXBuilder), the input stream debInputStream is being closed and the second call of depInputStream.getNextEntry() throws an exception "input buffer is closed" – adranale Nov 21 '11 at 15:20
  • Related, with similar implementation: [How to untar a TAR file using Apache Commons](http://stackoverflow.com/a/14211580/320399) – blong Nov 16 '13 at 23:39
  • Thanks for sharing. Would have been nice if they put an unTar method in the apache compress library. Seems like a fundamental operation. – Andrew Aug 01 '14 at 16:30
  • 3
    I faced an issue with 'The system cannot find the path specified' when OutputStream outputFileStream = new FileOutputStream(outputFile); to fix just add File parent = outputFile.getParentFile(); if (!parent.exists()) parent.mkdirs(); – Georgy Gobozov Aug 19 '15 at 10:27
  • I guess you need to do `is.close()` – Slow Harry Sep 01 '16 at 14:25
  • That kills permissions. If you want to untar a binary distribution, the executable permissions aren't set. – Kalle Richter Dec 11 '16 at 16:34
  • @SlowHarry , debInputStream.close(); will handle the is.close(). – vianna77 Oct 09 '17 at 14:09
  • 1
    WARNING! The code above has a security vulnerability (the zip file could include a relative path that will cause files outside the target directory to get overwritten). See https://snyk.io/research/zip-slip-vulnerability#what-action-should-you-take for how to fix it. – Lak Jun 07 '18 at 17:34
  • 1
    Instead of `inputFile.getName().length() - 3)` it would be better to use `inputFile.getName().lastIndexOf("."))` to avoid a hard-code extension length,. – martin_wun Nov 24 '21 at 07:42
22

Note: This functionality was later published through a separate project, Apache Commons Compress, as described in another answer. This answer is out of date.


I haven't used a tar API directly, but tar and bzip2 are implemented in Ant; you could borrow their implementation, or possibly use Ant to do what you need.

Gzip is part of Java SE (and I'm guessing the Ant implementation follows the same model).

GZIPInputStream is just an InputStream decorator. You can wrap, for example, a FileInputStream in a GZIPInputStream and use it in the same way you'd use any InputStream:

InputStream is = new GZIPInputStream(new FileInputStream(file));

(Note that the GZIPInputStream has its own, internal buffer, so wrapping the FileInputStream in a BufferedInputStream would probably decrease performance.)

Community
  • 1
  • 1
erickson
  • 265,237
  • 58
  • 395
  • 493
  • 2
    i was about to tell him about GZIPInputStream. But it won't help him, since he still needs to read the contained .tar file :) – Johannes Schaub - litb Nov 24 '08 at 22:31
  • 1
    Truth is I already know about GZIPInputStream, thanks to another question I asked here. But I don't know anything about tar APIs, and I was hoping there might be something that handles gzip in an integrated manner, so I didn't want to limit answers by saying what all I already knew. – skiphoppy Nov 24 '08 at 23:29
  • 3
    Apache classes bundled in 'ant' work fine. I use this every day: org.apache.tools.tar.TarEntry and org.apache.tools.tar.TarInputStream; the code is very similar to what you would use to unzip zip files. If you want to do Bzip2, use jaxlib. – tucuxi Nov 25 '08 at 10:57
  • 1
    There is (oddly) an excellent example of the Ant / TarInputStream variety here. https://code.google.com/p/jtar/ +1 for using ant libs btw – jsh May 08 '13 at 14:31
  • another for BZIP2 -- http://stackoverflow.com/questions/2322944/uncompress-bzip2-archive – jsh May 08 '13 at 14:52
15
Archiver archiver = ArchiverFactory.createArchiver("tar", "gz");
archiver.extract(archiveFile, destDir);

Dependency:

 <dependency>
        <groupId>org.rauschig</groupId>
        <artifactId>jarchivelib</artifactId>
        <version>0.5.0</version>
</dependency>
D3iv
  • 151
  • 1
  • 2
13

Apache Commons VFS supports tar as a virtual file system, which supports URLs like this one tar:gz:http://anyhost/dir/mytar.tar.gz!/mytar.tar!/path/in/tar/README.txt

TrueZip or its successor TrueVFS does the same ... it's also available from Maven Central.

Jörg
  • 2,434
  • 23
  • 37
8

I just tried a bunch of the suggested libs (TrueZip, Apache Compress), but no luck.

Here is an example with Apache Commons VFS:

FileSystemManager fsManager = VFS.getManager();
FileObject archive = fsManager.resolveFile("tgz:file://" + fileName);

// List the children of the archive file
FileObject[] children = archive.getChildren();
System.out.println("Children of " + archive.getName().getURI()+" are ");
for (int i = 0; i < children.length; i++) {
    FileObject fo = children[i];
    System.out.println(fo.getName().getBaseName());
    if (fo.isReadable() && fo.getType() == FileType.FILE
        && fo.getName().getExtension().equals("nxml")) {
        FileContent fc = fo.getContent();
        InputStream is = fc.getInputStream();
    }
}

And the maven dependency:

    <dependency>
      <groupId>commons-vfs</groupId>
      <artifactId>commons-vfs</artifactId>
      <version>1.0</version>
    </dependency>
Renaud
  • 16,073
  • 6
  • 81
  • 79
6

In addition to gzip and bzip2, Apache Commons Compress API has also tar support, originally based on ICE Engineering Java Tar Package, which is both API and standalone tool.

fglez
  • 8,422
  • 4
  • 47
  • 78
Jörg
  • 2,434
  • 23
  • 37
  • 1
    Apache Commons Compress API has tar support and is originally based on above ICE tar package I believe: http://commons.apache.org/compress/ – Jörg Nov 12 '10 at 13:20
  • 2
    My test show ICE tar to be the fastest among five contenders (ice, compress, ant, xeus + vfs), whereas Commons Compress comes in second ... however ICE tar seems a tad less reliable WRT completeness of unpacking all entries and WRT keeping archive entries original file names. – Jörg Jan 26 '11 at 14:39
5

Here's a version based on this earlier answer by Dan Borza that uses Apache Commons Compress and Java NIO (i.e. Path instead of File). It also does the uncompression and untarring in one stream so there's no intermediate file creation.

public static void unTarGz( Path pathInput, Path pathOutput ) throws IOException {
    TarArchiveInputStream tararchiveinputstream =
        new TarArchiveInputStream(
            new GzipCompressorInputStream(
                new BufferedInputStream( Files.newInputStream( pathInput ) ) ) );

    ArchiveEntry archiveentry = null;
    while( (archiveentry = tararchiveinputstream.getNextEntry()) != null ) {
        Path pathEntryOutput = pathOutput.resolve( archiveentry.getName() );
        if( archiveentry.isDirectory() ) {
            if( !Files.exists( pathEntryOutput ) )
                Files.createDirectory( pathEntryOutput );
        }
        else
            Files.copy( tararchiveinputstream, pathEntryOutput );
    }

    tararchiveinputstream.close();
}
Wade Walker
  • 504
  • 6
  • 7
  • Doesn't the `Files.copy` copy the entire archive to just one file? – Yann Aug 31 '22 at 09:07
  • OK, it works like this because `TarArchiveInputStream` is ["aware of the boundaries of the current entry in the archive"](https://commons.apache.org/proper/commons-compress/apidocs/org/apache/commons/compress/archivers/tar/TarArchiveInputStream.html#read-byte:A-int-int-) – Yann Aug 31 '22 at 09:25
4

What about using this API for tar files, this other one included inside Ant for BZIP2 and the standard one for GZIP?

Fernando Miguélez
  • 11,196
  • 6
  • 36
  • 54