5

I have a ByteArrayOutputStream object that I'm getting the following error for:

java.lang.ArrayIndexOutOfBoundsException at 
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:113)

I am trying to load a file that is several gigs into it by writing byte[] chunks of 250mb one at a time.

I can watch the byte grow in size and as soon as it hits length 2147483647, the upper limit of int, it blows up on the following line:

stream.write(buf); 

stream is the ByteArrayOutputStream, buf is what I'm writing to the stream in 250mb chunks.

I was planning to do

byte result[] = stream.toByteArray();

At the end. Is there some other method I can try that will support byte array sizes greater than the int upper limit?

Brian
  • 73
  • 1
  • 3
  • 7
  • 7
    Just a word of advice: don't store an array that large in memory. Do you really need to have all those gigs in the memory at once? – Pieter Bos Feb 22 '12 at 15:16
  • 4
    Do you really need more than 640K? – Andy Thomas Feb 22 '12 at 15:21
  • Note: a proposal for large arrays was considered but did not make it into Java 7. Perhaps we'll see it in Java 8? – Andy Thomas Feb 22 '12 at 15:22
  • 2
    If you have a 64G memory machine that you're using to run scientific experiments on, where your data is 30G, it makes a lot of sense to load everything into memory -- it will almost always result in a large time savings. – Malcolm Oct 06 '12 at 20:13
  • possible duplicate of [Java array with more than 4gb elements](http://stackoverflow.com/questions/878309/java-array-with-more-than-4gb-elements) – Ciro Santilli OurBigBook.com Feb 06 '15 at 10:57
  • @Malcolm If you're needing that kind of time savings, Java is not the language you should be using. – Cdaragorn Jun 28 '17 at 20:20
  • @Cdaragorn not really, the JVM is incredibly fast. It is an absolute memory hog and requires some small, constant-ish amount of time to warm up and JIT things. As a GC language & runtime, one obviously expects it to be a bit slower than a non-GC platform. But please, chime in with a non-constructive, snarky, immature, off-topic comment that is inaccurate. – Malcolm Jun 29 '17 at 01:41
  • @Malcolm Your choice to insult me speaks volumes on its own. I didn't say Java was a terrible language. It just isn't a good language to choose when you need every bit of performance you can get. It is several times slower than any native language even at its best. If you feel that you get other benefits from it like faster dev time that outweigh that, great! – Cdaragorn Jul 03 '17 at 16:25

3 Answers3

8

Arrays in Java simply can't exceed the bounds of int.

From the JLS section 15.10:

The type of each dimension expression within a DimExpr must be a type that is convertible (§5.1.8) to an integral type, or a compile-time error occurs. Each expression undergoes unary numeric promotion (§). The promoted type must be int, or a compile-time error occurs; this means, specifically, that the type of a dimension expression must not be long.

Likewise in the JVM spec for arraylength:

The arrayref must be of type reference and must refer to an array. It is popped from the operand stack. The length of the array it references is determined. That length is pushed onto the operand stack as an int.

That basically enforces the maximum size of arrays.

It's not really clear what you were going to do with the data after loading it, but I'd attempt not to need to load it all into memory to start with.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
2

Use more than one array. When you reach the limit use ByteArrayOutputStream.toByteArray() and reset with ByteArrayOutputStream.reset().

Dev
  • 11,919
  • 3
  • 40
  • 53
2

Using a ByteArrayOutputStream for writing several GiB of data is not a good idea as everything has to held in the computer's memory. As you have noticed a byte array is limited to 2^31 bytes (2GiB).

Additionally the buffer used for storing that data does not grow if you write more data in it, therefore if the used buffer is getting full a new one has to be created (usually of double size) and all data has to copied from the old buffer into the new one.

My advice would be to use RandomAccessFile and save the data you get to a file. Via RandomAccessFile you can operate on data files larger than 2GiB.

Robert
  • 39,162
  • 17
  • 99
  • 152