I published some code that does exactly what you are looking for. It has always frustrated me that Java doesn't automatically pipeline calls like this across multiple threads, in order to overlap computation, compression, and disk I/O:
https://github.com/lukehutch/PipelinedOutputStream
This class splits writing to an OutputStream
into separate producer and consumer threads (actually, starts a new thread for the consumer), and inserts a blocking bounded buffer between them. There is some data copying between buffers, but this is done as efficiently as possible.
You can even layer this twice to do the disk writing in a separate thread from the gzip compression, as shown in README.md
.