7

The gzip input/output stream dont operate on Java direct buffers.

Is there any compression algorithm implementation out there that operates directly on direct buffers?

This way there would be no overhead of copying a direct buffer to a java byte array for compression.

pdeva
  • 43,605
  • 46
  • 133
  • 171
  • 1
    Compression without overhead is impossible. Direct buffers are, by definition, _"a container for a fixed amount of data of a specific primitive type"_. A transformation such an compression or encryption must be done outside of the buffer. – Stephen P Jan 07 '12 at 01:08
  • 1
    i understand. i just want to do the compression without the added penalty of first copying the entire direct buffer array to a java byte array – pdeva Jan 07 '12 at 01:10
  • 3
    GZIPInputStream doesn't create a copy - it streams right out of the file (based on checking the source). So I imagine it is probably faster than creating your own direct buffer and mapping a file to it. If you really want to use a direct buffer, you could write your own InputStream that streams from your buffer... – Russell Zahniser Jan 07 '12 at 01:55
  • 2
    GZIP compression is so much slower than just copying the data its unlikely to make much difference. – Peter Lawrey Jan 07 '12 at 07:58
  • 1
    russell: my direct buffer is not created from a file. i am creating it my code to avoid gc – pdeva Jan 08 '12 at 00:50
  • unfortunately dudes from jdk team didn't add Direct buffer to the inflater/deflater, few lines of code and no locking/copying the byte[]. alas. Take a look at jzlib, it can be modified (cant post the whole modified version) to use ByteBuffer instead of byte[]. – bestsss Jan 15 '12 at 00:58
  • @RussellZahniser, it **does create a copy** to load from the file, the default GZipStream uses 512 bytes of buffer to read small chucks and pass to native code. Using a mapped buffer and passing directly to the native zlib would be times better. – bestsss Jan 15 '12 at 01:00
  • @Peter, technically you can specify compression level and the max bits. the memory allocation/deallocation due to high max bits (15) and memory level (8 out of 9) doesn't help w/ small chunks of compression. reducing that and reusing the deflater greatly improves the speed for small parts and brings 2x compression level. it aint so bad (surely i do not use the plain gzip though) – bestsss Jan 15 '12 at 01:02
  • In the past when I have wanted compression reasonable efficient and on small messages, I have written my own strategy. Given the Deflator has to learn even time, having domain knowledge of the data format can yield as good or better results. – Peter Lawrey Jan 15 '12 at 08:53
  • Perhaps you could describe the format of the data you want to compress with examples, and we could discuss how to compress it the most efficiently (perhaps in another question) – Peter Lawrey Jan 15 '12 at 08:58

3 Answers3

2

Wow old question, but stumbled upon this today.

Probably some libs like zip4j can handle this, but you can get the job done with no external dependencies since Java 11:

If you are interested only in compressing data, you can just do:

void compress(ByteBuffer src, ByteBuffer dst) {
    var def = new Deflater(Deflater.DEFAULT_COMPRESSION, true);
    try {
        def.setInput(src);
        def.finish();
        def.deflate(dst, Deflater.SYNC_FLUSH);

        if (src.hasRemaining()) {
            throw new RuntimeException("dst too small");
        }
    } finally {
        def.end();
    }
}

Both src and dst will change positions, so you might have to flip them after compress returns.

In order to recover compressed data:

void decompress(ByteBuffer src, ByteBuffer dst) throws DataFormatException {
    var inf = new Inflater(true);
    try {
        inf.setInput(src);
        inf.inflate(dst);

        if (src.hasRemaining()) {
            throw new RuntimeException("dst too small");
        }

    } finally {
        inf.end();
    }
}

Note that both methods expect (de-)compression to happen in a single pass, however, we could use slight modified versions in order to stream it:

void compress(ByteBuffer src, ByteBuffer dst, Consumer<ByteBuffer> sink) {
    var def = new Deflater(Deflater.DEFAULT_COMPRESSION, true);
    try {
        def.setInput(src);
        def.finish();
        int cmp;
        do {
            cmp = def.deflate(dst, Deflater.SYNC_FLUSH);
            if (cmp > 0) {
                sink.accept(dst.flip());
                dst.clear();
            }
        } while (cmp > 0);
    } finally {
        def.end();
    }
}

void decompress(ByteBuffer src, ByteBuffer dst, Consumer<ByteBuffer> sink) throws DataFormatException {
    var inf = new Inflater(true);
    try {
        inf.setInput(src);
        int dec;
        do {
            dec = inf.inflate(dst);

            if (dec > 0) {
                sink.accept(dst.flip());
                dst.clear();
            }

        } while (dec > 0);
    } finally {
        inf.end();
    }
}

Example:

void compressLargeFile() throws IOException {
    var in = FileChannel.open(Paths.get("large"));
    var temp = ByteBuffer.allocateDirect(1024 * 1024);
    var out = FileChannel.open(Paths.get("large.zip"));

    var start = 0;
    var rem = ch.size();
    while (rem > 0) {
        var mapped=Math.min(16*1024*1024, rem);
        var src = in.map(MapMode.READ_ONLY, start, mapped);

        compress(src, temp, (bb) -> {
            try {
                out.write(bb);
            } catch (IOException e) {
                throw new UncheckedIOException(e);
            }
        });
        
        rem-=mapped;
    }
}

If you want fully zip compliant data:

void zip(ByteBuffer src, ByteBuffer dst) {
    var u = src.remaining();
    var crc = new CRC32();
    crc.update(src.duplicate());
    writeHeader(dst);

    compress(src, dst);

    writeTrailer(crc, u, dst);
}

Where:

void writeHeader(ByteBuffer dst) {
    var header = new byte[] { (byte) 0x8b1f, (byte) (0x8b1f >> 8), Deflater.DEFLATED, 0, 0, 0, 0, 0, 0, 0 };
    dst.put(header);
}

And:

void writeTrailer(CRC32 crc, int uncompressed, ByteBuffer dst) {
    if (dst.order() == ByteOrder.LITTLE_ENDIAN) {
        dst.putInt((int) crc.getValue());
        dst.putInt(uncompressed);
    } else {
        dst.putInt(Integer.reverseBytes((int) crc.getValue()));
        dst.putInt(Integer.reverseBytes(uncompressed));
    }

So, zip imposes 10+8 bytes of overhead.

In order to unzip a direct buffer into another, you can wrap the src buffer into an InputStream:

class ByteBufferInputStream extends InputStream {

    final ByteBuffer bb;

    public ByteBufferInputStream(ByteBuffer bb) {
        this.bb = bb;
    }

    @Override
    public int available() throws IOException {
        return bb.remaining();
    }

    @Override
    public int read() throws IOException {
        return bb.hasRemaining() ? bb.get() & 0xFF : -1;
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        var rem = bb.remaining();

        if (rem == 0) {
            return -1;
        }

        len = Math.min(rem, len);

        bb.get(b, off, len);

        return len;
    }

    @Override
    public long skip(long n) throws IOException {
        var rem = bb.remaining();

        if (n > rem) {
            bb.position(bb.limit());
            n = rem;
        } else {
            bb.position((int) (bb.position() + n));
        }

        return n;
    }
}

and use:

void unzip(ByteBuffer src, ByteBuffer dst) throws IOException {
    try (var is = new ByteBufferInputStream(src); var gis = new GZIPInputStream(is)) {
        var tmp = new byte[1024];

        var r = gis.read(tmp);

        if (r > 0) {
            do {
                dst.put(tmp, 0, r);
                r = gis.read(tmp);
            } while (r > 0);
        }

    }
}

Of course, this is not cool since we are copying data to a temporary array, but nevertheless, it is sort of a roundtrip check that proves that nio-based zip encoding writes valid data that can be read from standard io-based consumers.

So, if we just ignore crc consistency checks we can just drop header/footer:

void unzipNoCheck(ByteBuffer src, ByteBuffer dst) throws DataFormatException {
    src.position(src.position() + 10).limit(src.limit() - 8);

    decompress(src, dst);
}
2

I don't mean to detract from your question, but is this really a good optimization point in your program? Have you verified with a profiler that you indeed have a problem? Your question as stated implies you have not done any research, but are merely guessing that you will have a performance or memory problem by allocating a byte[]. Since all the answers in this thread are likely to be hacks of some sort, you should really verify that you actually have a problem before fixing it.

Back to the question, if you're wanting to compress the data "in place" in on a ByteBuffer, the answer is no, there is no capability to do that built into Java.

If you allocated your buffer like the following:

byte[] bytes = getMyData();
ByteBuffer buf = ByteBuffer.wrap(bytes);

You can filter your byte[] through a ByteBufferInputStream as the previous answer suggested.

Jonathan S. Fisher
  • 8,189
  • 6
  • 46
  • 84
  • i am accepting this as answer, but still waiting for one that provides a solution, say in the form of a library that operates using jni on byte buffers. – pdeva Jan 20 '12 at 21:09
  • I was curious about this question because I wanted to find a way to convert a folder to a zip file in name only in place for rapid deletion of large folders. – Erik Reppen Feb 15 '13 at 00:24
  • 1
    avoiding copying data is almost always a significant boost to performance. However data that is already in a direct buffer cannot be compressed without being copied unless done by the OS itself. – gregw Dec 05 '13 at 08:05
0

If you are using ByteBuffers you can use some simple Input/OutputStream wrappers such as these:

public class ByteBufferInputStream extends InputStream {

    private ByteBuffer buffer = null;

    public ByteBufferInputStream( ByteBuffer b) {
        this.buffer = b;
    }

    @Override
    public int read() throws IOException {
        return (buffer.get() & 0xFF);
    }
}

public class ByteBufferOutputStream extends OutputStream {

    private ByteBuffer buffer = null;

    public ByteBufferOutputStream( ByteBuffer b) {
        this.buffer = b;
    }

    @Override
    public void write(int b) throws IOException {
        buffer.put( (byte)(b & 0xFF) );
    }

}

Test:

ByteBuffer buffer = ByteBuffer.allocate( 1000 );
ByteBufferOutputStream bufferOutput = new ByteBufferOutputStream( buffer );
GZIPOutputStream output = new GZIPOutputStream( bufferOutput );
output.write("stackexchange".getBytes());
output.close();

buffer.position( 0 );

byte[] result = new byte[ 1000 ];

ByteBufferInputStream bufferInput = new ByteBufferInputStream( buffer );
GZIPInputStream input = new GZIPInputStream( bufferInput );
input.read( result );

System.out.println( new String(result));
Guillaume Serre
  • 307
  • 2
  • 8
  • 3
    even wrapping the bytebuffer into a stream doesn't help, as it's copied internally (sometimes twice), sorta defeats the purpose of the bytebuffer – bestsss Jan 15 '12 at 01:04
  • Sorry but I don't get it, when would that copy occur ? I double checked the code for InputStream, OutputStream and even the GZIP classes and cannot find any copy. – Guillaume Serre Jan 17 '12 at 10:33
  • that's how it works, check InflatedInputStream and the native impl has to copy (or pin, depends on the JVM/GC) the byte[] to pass it to the zlib – bestsss Jan 17 '12 at 11:17