6

Similar to "Byte array of unknown length in java" I need to be able to write an unknown number of bytes from a data source into a byte[] array. However I need the ability to read from bytes that were stored earlier, for a compression algorithm, so ByteArrayOutputStream doesn't work for me.

Right now I have a scheme where I allocate ByteBuffers of fixed size N, adding a new one as I reach N, 2N, 3N bytes etc. After the data is exhausted I dump all buffers into an array of now-known size.

Is there a better way to do this? Having fixed-size buffers reduces the flexibility of the compression algorithm.

Community
  • 1
  • 1
Ian Durkan
  • 1,212
  • 1
  • 12
  • 26

7 Answers7

5

What about using a circular byte buffer? It has the possibility to grow dynamically and is efficient.

There's an implementation here: http://ostermiller.org/utils/CircularByteBuffer.java.html

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
Chris Dennett
  • 22,412
  • 8
  • 58
  • 84
  • See also [`com.Ostermiller.util.CircularByteBuffer`](http://ostermiller.org/utils/doc/com/Ostermiller/util/CircularByteBuffer.html). – trashgod Jun 26 '11 at 02:39
4

Why don't you subclass ByteArrayOutputStream? That way your subclass has access to the protected buf and count fields, and you can add methods to your class to manipulate them directly.

vanza
  • 9,715
  • 2
  • 31
  • 34
2

As Chris answered the CircularByteBuffer api is the way to go. Luckily it is in central maven repo now. Quoting a snippet from this link, it is as simple as follows:

Single Threaded Example of a Circular Buffer

// buffer all data in a circular buffer of infinite size
CircularByteBuffer cbb = new CircularByteBuffer(CircularByteBuffer.INFINITE_SIZE);
class1.putDataOnOutputStream(cbb.getOutputStream());
class2.processDataFromInputStream(cbb.getInputStream());

Advantages are:

  • One CircularBuffer class rather than two pipe classes.
  • It is easier to convert between the "buffer all data" and "extra threads" approaches.
  • You can change the buffer size rather than relying on the hard-coded 1k of buffer in the pipes.

Finally we are free of memory concerns and pipes API

Sym-Sym
  • 3,578
  • 1
  • 29
  • 30
2

The expense of the ByteArrayOutputStream is the resizing of the underlying array. Your fixed block routine eliminates much of that. If the resizing isn't expensive enough to you to matter (i.e. in your testing the ByteArrayOutputStream is "fast enough", and doesn't provide undo memory pressure), then perhaps subclassing ByteArrayOutputStream, as suggested by vanza, would work for you.

I don't know your compression algorithm, so I can't say why your list of blocks is making it less flexible, or even why the compression algorithm would even KNOW about the blocks. But since the blocks can by dynamic, you may be able to tune the block size as appropriate to better support the variety of the compression algorithm you're using.

If the compression algorithm can work on a "stream" (i.e. fixed size chunks of data), then the block size should matter as you could hide all of those details from the implementation. The perfect world is if the compression algorithm wants its data in chunks that match the size of the blocks your allocating, that way you wouldn't have to copy data to feed the compressor.

Will Hartung
  • 115,893
  • 19
  • 128
  • 203
2

While you can certainly use an ArrayList for this, you pretty much look at an memory overhead of 4-8times - assuming that bytes aren't newly allocated but share one global instance (since this is true for integers I assume it works for Bytes as well) - and you lose all cache locality.

So while you could subclass ByteArrayOutputStream, but even there you get overhead (the methods are synchronized) that you don't need. So I personally would just roll out my own class that grows dynamically when you write to it. Less efficient than your current method, but simple and we all know the part with the amortized costs - otherwise you can obviously use your solution as well. As long as you wrap the solution in a clean interface you'll hide the complexity and still get the good performance

Or otherwise said: No you pretty much can't do this more efficiently than what you're already doing and every built-in java Collection should perform worse for one reason or the other.

Voo
  • 29,040
  • 11
  • 82
  • 156
  • I'm not sure I understand how "the methods are synchronized" in [`ByteArrayOutputStream`](http://download.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html). Can you elaborate? – trashgod Jun 26 '11 at 03:14
  • 1
    Well for whatever reason the methods in the class are defined synchronized - although I think you can overwrite synchronized methods with normal methods? not sure. If not, well the synchronization overhead is quite useless for your situation. – Voo Jun 26 '11 at 03:23
  • Ah, I [see](http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java/io/ByteArrayOutputStream.java.html) it now; thanks. – trashgod Jun 26 '11 at 03:29
0

For simplicity, you might consider using java.util.ArrayList:

ArrayList<Byte> a = new ArrayList<Byte>();
a.add(value1);
a.add(value2);
...
byte value = a.get(0);

Java 1.5 and higher will provide automatic boxing and unboxing between the byte and Byte types. Performance may be slightly worse than ByteArrayOutputStream, but it is easy to read and understand.

Calvin
  • 2,872
  • 2
  • 21
  • 31
  • 2
    Well it takes 4-8times the memory and you lose the cache locality. Worse yet, you eliminate any chances of the JIT to vectorize the code - or at least just work on larger values at once (ie you can't work on word sized values and do some bit tricks). "slightly" worse may be a bit optimistic. – Voo Jun 26 '11 at 01:12
  • True enough, thanks for the info. This would be one of the worse solutions if performance is a concern. – Calvin Jun 26 '11 at 01:28
  • This was my first Idea as well, but led to an 'OutOfMemoryError' for rather small files. I would not reccomend this solution – Michael Kern Dec 13 '22 at 17:15
0

I ended up writing my own method which uses a temporary fixed buffer array and appends it to your final byte array each time after the fixed buffer is filled. It will continue to overwrite the fixed buffer array and append to your final array until all bytes are read and copied. At the end, if the temporaryArray is not filled, it will copy the read bytes from that array into the final array. My code was written for Android, so you may need to use a similar method to ArrayUtils.concatByteArrays (com.google.gms.common.util.ArrayUtils)

My code has the temporary array size set to 100 (growBufferSize) but it might be better to set above 500 or even 1000 or whatever performs best on the environment you use. The final result will be stored in the bytesfinal array.

This method should reduce memory usage to prevent OutOfMemoryError s. Since it is using mainly primitives, memory should be reduced.

final int growBufferSize = 100;
byte[] fixedBuffer = new byte[growBufferSize];
byte[] bytesfinal = new byte[0];

int fixedBufferIndex=0;
while (zin.available()>0){
    fixedBuffer[fixedBufferIndex] = (byte)zin.read();
    if (fixedBufferIndex == growBufferSize-1){
        bytesfinal = ArrayUtils.concatByteArrays(bytesfinal,fixedBuffer);
        fixedBufferIndex = -1;
    }

    fixedBufferIndex++;
}

if (fixedBufferIndex!=0){
    byte[] lastBytes = new byte[fixedBufferIndex];
    //copy last read bytes to new array
    for (int i = 0; i<fixedBufferIndex; i++){
        lastBytes[i]=fixedBuffer[i];
    }

    //add last bits of data
    bytesfinal = ArrayUtils.concatByteArrays(bytesfinal,lastBytes);
    lastBytes = null;
    fixedBuffer = null;
}

public class ArrayUtils {

    static byte[] concatByteArrays(byte[] first, byte[] second) {
        byte[] result = Arrays.copyOf(first, first.length + second.length);
        System.arraycopy(second, 0, result, first.length, second.length);
        return result;
    }
}
Michael Kern
  • 639
  • 8
  • 15