18

I have a fairly large BZ2 file that with several text files in it. Is it possible for me to use Java to uncompress certain files inside the BZ2 file and uncompress/parse the data on the fly? Let's say that a 300mb BZ2 file contains 1 GB of text. Ideally, I'd like my java program to say read 1 mb of the BZ2 file, uncompress it on the fly, act on it and keep reading the BZ2 file for more data. Is that possible?

Thanks

user587363
  • 201
  • 1
  • 2
  • 3
  • Please note that bzip2/bz2 files are compressed single files. They are not archives that can contain more than one file (or directories) like zip or other formats. – Sean Anderson Aug 21 '18 at 19:42

3 Answers3

31

The commons-compress library from apache is pretty good. Here's their samples page: http://commons.apache.org/proper/commons-compress/examples.html

Here's the latest maven snippet:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-compress</artifactId>
    <version>1.10</version>
</dependency>

And here's my util method:

public static BufferedReader getBufferedReaderForCompressedFile(String fileIn) throws FileNotFoundException, CompressorException {
    FileInputStream fin = new FileInputStream(fileIn);
    BufferedInputStream bis = new BufferedInputStream(fin);
    CompressorInputStream input = new CompressorStreamFactory().createCompressorInputStream(bis);
    BufferedReader br2 = new BufferedReader(new InputStreamReader(input));
    return br2;
}
Chilly
  • 578
  • 4
  • 11
  • 1
    Note: the accepted formats are: gzip, bzip2, xz, lzma, Pack200, DEFLATE and Z. As seen in the link, the correct one is automatically assigned – Danielson Aug 15 '15 at 10:01
2

The Ant project contains a bzip2 library. Which has a org.apache.tools.bzip2.CBZip2InputStream class. You can use this class to decompress the bzip2 file on the fly - it just extends the standard Java InputStream class.

martineno
  • 2,623
  • 17
  • 14
0

You can use org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream from Apache commons-compress

InputStream inputStream = new BZip2CompressorInputStream(new FileInputStream(xmlBz2File), true) // true should be used for big files, as I understand

and than org.apache.commons.compress.utils.IOUtils:

    int pos = 0;
    int step = 1024 * 32;
    byte[] buffer = new byte[step];
    int actualLength = 1;
    while (actualLength > 0) {
        actualLength = IOUtils.readFully(inputStream, buffer, pos, step);
        pos += actualLength;
        String str = new String(buffer, 0, actualLength, StandardCharsets.UTF_8);
        // something what you want to do
    }

But it may be hard to deal with back presure (consumer may be faster then producer and vice versa). So I tried to use Akka Streams with BZip2CompressorInputStream.

Mikhail Ionkin
  • 568
  • 4
  • 20