5

I'm trying to find a class for storing a vector of bytes in Java, which supports: random access (so I can get or set a byte anywhere), resizing (so I can either append stuff to the end or else manually change the size), reasonable efficiency (I might be storing megabytes of data in these things), all in memory (I don't have a file system). Any suggestions?

So far the candidates are:

  • byte[]. Not resizable.
  • java.util.Vector<Byte>. Evil. Also painfully inefficient.
  • java.io.ByteArrayOutputStream. Not random-access.
  • java.nio.ByteBuffer. Not resizable.
  • org.apache.commons.collections.primitives.ArrayByteList. Not resizable. Which is weird, because it'll automatically resize if you add stuff to it, you just can't change the size explicitly!
  • pure RAM implementation of java.nio.channels.FileChannel. Can't find one. (Remember I don't have a file system.)
  • Apache Commons VFS and the RAM filesystem. If I must I will, but I really like something lighter weight...

This seems like such an odd omission that I'm sure I must have missed something somewhere. I just figure out what. What am I missing?

David Given
  • 13,277
  • 9
  • 76
  • 123
  • What is evil and inefficient about `Vector`? – Miserable Variable Oct 01 '11 at 17:58
  • 1
    Are you also running with limited memory? Keep in mind that most of the resizable candidates do a full array copy when changing sizes, so you need at least 2x the memory of the largest array. – AngerClown Oct 01 '11 at 18:14
  • So Vector is on the Axis of Evil. Vector... Axis... Vector ... Voldemort, AHA! – Caffeinated Oct 01 '11 at 18:25
  • Wow, I forgot I posted this. The problem with `Vector` is that it's storing an array of `Byte` objects, rather than just bytes. Each `Byte` is huge (24 bytes, according to http://stackoverflow.com/questions/9037468/what-is-the-storage-cost-for-a-boxed-primitive-in-java) and they're also immutable, which means modifying the contents of the array involves creating new objects, and then garbage collecting the old ones. – David Given Aug 07 '14 at 07:52

3 Answers3

3

I would consider a class that wraps lots of chunks of byte[] arrays as elements of an ArrayList or Vector.

Make each chunk a multiple of e.g. 1024 bytes, so you accessor functions can take index >> 10 to access the right element of the ArrayList, and then index & 0x3ff to access the specific byte element of that array.

This will avoid the wastage of treating each byte as a Byte object, at the expensive of wastage of whatever's left over at the end of the last chunk.

Alnitak
  • 334,560
  • 70
  • 407
  • 495
  • 1
    Sure; but what I really want to know is whether such a thing already exist in a standard library. (In the interests of not reinventing the wheel, since it's liable to be square.) – David Given Oct 01 '11 at 19:04
  • I'm not aware of any implementation, but it really shouldn't take more than half an hour or so to write one. – Alnitak Oct 01 '11 at 19:56
  • Sure, but that's not the point. I don't want to use something I hack up myself if there's a better standard part available. This seems such an obvious thing to want that I'm sure there _is_ such a standard part available... somewhere. – David Given Oct 01 '11 at 21:15
1

In those cases, I simply initialize a reference to an array of reasonable length, and when it gets too small, create and copy it to a larger one by calling Arrays.copyOf(). e.g.:

byte[] a = new byte[16];
if (condition) {
    a = Arrays.copyOf(a, a.length*2);
}

And we may wrap that in a class if needed...

Samuel Audet
  • 4,964
  • 1
  • 26
  • 33
0

ArrayList is more efficient than Vector because it's not synchronized.

To improve efficiency you can start with a decent initial capacity. See the constructor:

ArrayList(int initialCapacity) 

I think the default is only 16, while you probably may need much bigger size.

Despite what it seems, ArrayList is very efficient even with a million record! I wrote a small test program that adds a million record to an ArrayList, without declaring an initial capacity and on my Linux, 5 year old laptop (Core 2 T5200 cpu), it takes around 100 milliseconds only to fill the whole list. If I declare a 1 million bytes initial space it takes around 60-80 ms, but If I declare 10,000 items it can take around 130-160ms, so maybe is better to not declare anything unless you can make a really good guess of the space needed.

About the concern for memory usage, it take around 8 Mb of memory, which I consider totally reasonable, unless you're writing phone software.

import java.util.ArrayList;
import java.util.List;
import java.util.Vector;

import org.obliquid.util.StopWatch;

public class ArrayListTest {

        public static void main(String args[]) {
                System.out.println("tot=" + Runtime.getRuntime().totalMemory() 
                     + " free=" + Runtime.getRuntime().freeMemory());
                StopWatch watch = new StopWatch();
                List<Byte> bytes = new ArrayList<Byte>();
                for (int i = 0; i < 1000000; i++) {
                        bytes.add((byte) (i % 256 - 128));
                }
                System.out.println("Elapsed: " 
                     + watch.computeElapsedMillisSeconds());
                System.out.println("tot=" + Runtime.getRuntime().totalMemory() 
                     + " free=" + Runtime.getRuntime().freeMemory());
        }
}

As expected, Vector performs a little worse in the range of 160-200ms.

Example output, without specifying a start size and with ArrayList implementation.

tot=31522816 free=31023176
Elapsed: 74
tot=31522816 free=22537648
stivlo
  • 83,644
  • 31
  • 142
  • 199
  • 3
    Will this box the bytes, as it's a generic? If so, that may lead to huge space inefficency. In the worst case, it would be 1 byte versus 4 for the reference + 8 for object header (OpenJDK docs say it's two words) + 1 for the actual byte value = 13x memory consumption. In practice, the bytes may be padded to 4 bytes and the `Byte` objects may be cached (quick calculation suggests caching all of them would require a mere 2.2 kB, so it's entirely reasonable), so the actual impact depends and should be measured. –  Oct 01 '11 at 18:02
  • Interesting, thank you, maybe @DavidGiven can you give us an idea of the order of magnitude of those bytes? – stivlo Oct 01 '11 at 18:10
  • As a rough ballpark, think a megabyte for worst case. – David Given Oct 01 '11 at 18:15
  • @delnan as you suppose the memory usage is actually more than a mere array of bytes. In David's worst case it takes around 8Mb and the speed is really fast. It seems very reasonable amount of memory to me, considering the convenience, I'd not worry about. – stivlo Oct 02 '11 at 05:10
  • Just to throw a couple cents in here. I ran this same test on a phone (HTC One X+) I'm using for development. Best case creating and populating a List with 1000000 elements took 339ms. Best case creating and populating a byte[1000000] took 7.75ms. Yes, you read that correctly. It took 43 times longer to use List. Just something to think about. If speed isn't a concern, then List is easy and convenient. If you are handling lots of data and would benefit from speed, perhaps byte[], but profile your code first! – Craig B Jul 31 '13 at 00:33
  • This is terribly inefficient in space, and ultimately in time (since you can't waste tons of space in Java without eventually paying a cost in time - due to cache misses, extra gc activity etc). Every single element you add will be its own object, and space waste will probably be something like 32x the size of wrapping a byte array. You don't have to measure it, it's a fact. – BeeOnRope Jun 15 '16 at 22:14