How to gunzip data on the fly as i'm reading it from an InputStream to an OutputStream?

Question

I have a large InputStream containing gzipped data.

I cannot modify the data in the InputStream directly. Code that uses this InputStream later on expects unmodified, compressed data. I could swap out the InputStream with a new InputStream if needed, but the data must remain compressed.

I need to print out the uncompressed contents of the InputStream as I go for debugging purposes.

What is the simplest way to print the uncompressed data in my InputStream to a PrintStream, without irrevocably uncompressing the InputStream itself and without reading the whole thing into memory?

Where is the underlying data coming from? If [`markSupported`](http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#markSupported()) is true then `reset` can help. If not, you will have to copy the content of compressed stream somewhere and decompress that. You will *have to* go to the end of the stream to print uncompressed — Miserable Variable, Jan 31 '13 at 23:50
I don't see why i'd have to. Seems like I could tee the stream, leave one unmodified, then decompress the other on the fly... — emmby, Feb 01 '13 at 00:12
That's not too different from what I am saying. The point is that in some cases you will *not* be able to rewind to a mark on the stream; so you need to read once and write it to two buffer -- one uncompressed and the other unmodified from the original. — Miserable Variable, Feb 01 '13 at 01:02
due to memory constraints, I can't read the entire stream to a buffer — emmby, Feb 01 '13 at 01:18

emmby · Answer 1 · 2013-02-06T19:45:41.940

Here's how I did it.

// http://stackoverflow.com/a/12107486/82156
public static InputStream wrapInputStreamAndCopyToOutputStream(InputStream in, final boolean gzipped, final OutputStream out) throws IOException {
    // Create a tee-splitter for the other reader.
    final PipedInputStream inCopy = new PipedInputStream();
    final TeeInputStream inWrapper = new TeeInputStream(in, new PipedOutputStream(inCopy));

    new Thread(Thread.currentThread().getName() + "-log-writer") {
        @Override
        public void run() {
            try {
                IOUtils.copy(gzipped ? new GZIPInputStream(inCopy) : inCopy, new BufferedOutputStream(out));
            } catch (IOException e) {
                Log.e(TAG, e);
            }
        }
    }.start();
    return inWrapper;
}

This method wraps the original InputStream and returns the wrapper, which you'll need to use from now on (don't use the original InputStream). It then uses an Apache Commons TeeInputStream to copy data to a PipedOutputStream using a thread, optionally decompressing it along the way.

To use, simply do something like the following:

InputStream inputStream = ...; // your original inputstream
inputStream = wrapInputStreamAndCopyToOutputStream(inputStream,true,System.out); // wrap your inputStream and copy the data to System.out

doSomethingWithInputStream(inputStream); // Consume the wrapped InputStream like you were already going to do

The background thread will stick around until the foreground thread consumes the entire input stream, buffering the output in chunks and periodically writing it to System.out until it's all done.

I'm confused why you need the the thread? It seems like you should have the pulling of the returned InputStream (`inWrapper`) cause the writing of the outputstream. Are you concerned that the writing out to the log will block? — Adam Gent, Feb 01 '13 at 02:37
yes, the thread is necessary because the pipe buffer is not large enough to hold the contents of the entire inputstream in memory at one time, which will result in blocking — emmby, Feb 01 '13 at 02:43

How to gunzip data on the fly as i'm reading it from an InputStream to an OutputStream?

1 Answers1