0

This question is related to How do I extract a tar file in Java? but I want to extract the tar file directly from a URL. (Let's take this file / url as an example).

I tried opening an URLConnection in combination with a GZIPInputStream:

URL url = new URL(zipFileUrl);
HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
InputStream inputStream = connection.getInputStream();
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);

but that just resulted in java.util.zip.ZipException: Not in GZIP format

It is of course possible to workaround the problem by first downloading the file, then extracting it, then deleting the original file. But I don't think that this results in a good experience for the user and there has to be a better way: maybe even similar to a normal ZipInputStream in which you iterate over the ZipEntry's and deal with them.

Abra
  • 19,142
  • 7
  • 29
  • 41
Lasnik
  • 256
  • 2
  • 11
  • 1
    Well, do you actually have a gzip file or a zip file? Very different things. – luk2302 Mar 07 '23 at 10:59
  • You might be able to use Commons Compress to do this, but it would have to support xz compression of the archive. Alternatively, you could use `ProcessBuilder` and do something like `bash -c wget -O - https://download.freedict.org/dictionaries/deu-eng/1.9-fd1/freedict-deu-eng-1.9-fd1.dictd.tar.xz | tar xJ`. Then of course, that archive contains, as its principal file, a file in gzip format – g00se Mar 07 '23 at 11:19
  • @luk2302 I am talking about this file https://download.freedict.org/dictionaries/deu-eng/1.9-fd1/freedict-deu-eng-1.9-fd1.dictd.tar.xz – Lasnik Mar 07 '23 at 11:55
  • 1
    If you take a structured approach, you can solve the problem entirely without the help of others. 1. What kind of file do you have? tar is a Unix archiver that combines files but does not compress them. xz is a compressor similar to GZip. So a file *.tar.xz is a TAR archive compressed with xz. You can't use GZip to decompress it. 2. Are TAR and xz supported by the Java standard library? A quick Google search will confirm that this is not the case. So you will have to use a library. – vanje Mar 07 '23 at 13:05
  • 3. via a Google search you will find appropriate Java libraries to decompress xz and unpack TAR. Study their documentation and especially the examples. Write a sample program to understand how it works. 4. you can also find out from the documentation of the library if you can unpack directly in memory without temporary files. If necessary, a specific Google search will help. – vanje Mar 07 '23 at 13:06
  • Update: Commons Compress *does* support xz but they've made themselves unuseful by not providing (afaics) utility methods to extract into a filesystem. Personally I'd use `bash` unless there's another library that makes this easy – g00se Mar 07 '23 at 13:49

2 Answers2

0

Here is a complete pure Java example using Apache Commons Compress and XZ for Java. Most code parts are copied from the Apache Commons Compress documentation and a little bit rearranged.

The archive file is processed directly without storing it as a temporary file.

package org.example;

import org.apache.commons.compress.archivers.ArchiveEntry;
import org.apache.commons.compress.archivers.ArchiveInputStream;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.compress.compressors.xz.XZCompressorInputStream;
import org.apache.commons.compress.utils.IOUtils;

import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import java.nio.file.Files;

public class DecompressionExample {
  private static final String INPUT_URL = "https://download.freedict.org/dictionaries/deu-eng/1.9-fd1/freedict-deu-eng-1.9-fd1.dictd.tar.xz";
  private static final String TARGET_DIR_NAME = "/home/barnabas/dev/tmp/out";

  public static void main( String[] args ) throws MalformedURLException {
    File targetDir = new File(TARGET_DIR_NAME);
    URL url = new URL(INPUT_URL);

    try (
      InputStream fi = url.openStream();
      InputStream bi = new BufferedInputStream(fi);
      InputStream xzi = new XZCompressorInputStream(bi);
      ArchiveInputStream i = new TarArchiveInputStream(xzi)
    ) {
      ArchiveEntry entry = null;
      while ((entry = i.getNextEntry()) != null) {
        if (!i.canReadEntryData(entry)) {
          // log something?
          continue;
        }
        File f = fileName(targetDir, entry);
        if (entry.isDirectory()) {
          if (!f.isDirectory() && !f.mkdirs()) {
            throw new IOException("failed to create directory " + f);
          }
        } else {
          File parent = f.getParentFile();
          if (!parent.isDirectory() && !parent.mkdirs()) {
            throw new IOException("failed to create directory " + parent);
          }
          try (OutputStream o = Files.newOutputStream(f.toPath())) {
            IOUtils.copy(i, o);
          }
        }
      }
    } catch(Exception e) {
      e.printStackTrace();
    }
  }

  private static File fileName(File targetDir, ArchiveEntry entry) {
    return new File(targetDir, entry.getName());
  }
}

These are the Maven dependencies:

<dependency>
  <groupId>org.apache.commons</groupId>
  <artifactId>commons-compress</artifactId>
  <version>1.22</version>
</dependency>
<dependency>
  <groupId>org.tukaani</groupId>
  <artifactId>xz</artifactId>
  <version>1.9</version>
</dependency>
vanje
  • 10,180
  • 2
  • 31
  • 47
-2
String[] cmd = {"sh", "-c", "curl -s " + url_ + " | unxz | tar -xvf -"};
    ProcessBuilder processBuilder = new ProcessBuilder(cmd);
    processBuilder.directory(new File("."));
    Process process = processBuilder.start();

OR

public static void main(String[] args) throws IOException {
    final String url_ = "https://download.freedict.org/dictionaries/deu-eng/1.9-fd1/freedict-deu-eng-1.9-fd1.dictd.tar.xz";

    URL url = new URL(url_);
    HttpsURLConnection connection = (HttpsURLConnection) url.openConnection();
    InputStream inputStream = connection.getInputStream();
    OutputStream outputStream = new FileOutputStream(File.createTempFile("urFile", ".tar.xz"));
    byte[] buffer = new byte[1024];
    int bytesRead;
    while ((bytesRead = inputStream.read(buffer)) != -1) {
        outputStream.write(buffer, 0, bytesRead);
    }
}