Short answer,
without doing anything, you can push the current limit by a factor of 1.5. It means that, if you are able to process 800MB, you can process 1200 MB. It also means that if by some trick with java -Xm ....
you can move to a point where your current code can process 7GB, your problem is solved, because the 1.5 factor will take you to 10.5GB, assuming you have that space available on your system and that JVM can get it.
Long answer:
The error is pretty self-descriptive. You hit the practical memory limit on your configuration. There is a lot of speculating about the limit that you can have with JVM, I do not know enough about that, since I can not find any official information. However, you will somehow be limited by constraints like the available swap, the kernel address space usage, the memory fragmentation, etc.
What is happening now is that ByteArrayOutputStream
objects are created with a default buffer of size 32 if you do not supply any size (this is your case). Whenever you call the write
method on the object, there is an internal machinery that is started. The openjdk implementation release 7u40-b43 that seems to match perfectly with the output of your error, uses an internal method ensureCapacity
to check that the buffer has enough room to put the bytes you want to write. If there is not enough room, another internal method grow
is called to grow the size of the buffer. The method grow
defines the appropriate size and calls the method copyOf
from the class Arrays
to do the job.
The appropriate size of the buffer is the maximum between the current size and the size riquired to hold all the content (the present content and the new content to be write).
The method copyOf
from the class Arrays
(follow the link) allocates the space for the new buffer, copy the content of the old buffer to the new one and return it to grow
.
Your problem occurs at the allocation of the space for the new buffer, After some write
, you got to a point where the available memory is exhausted: java.lang.OutOfMemoryError: Java heap space
.
If we look into details, you are reading by chunks of 2048. So
- your first write to the grows the size of the buffer from 32 to 2048
- your second call will double it to 2*2048
- your third call will take it to 2^2*2048, you have to time to write two more times before the need of allocating.
- then 2^3*2048, you will have the time for 4 mores writes before allocating again.
- at some point, your buffer will be of size 2^18*2048 which is 2^19*1024 or 2^9*2^20 (512 MB)
- then 2^19*2048 which is 1024 MB or 1 GB
Something that is unclear in your description is that you can somehow read up to 800MB, but can no go beyond. You have to explain that to me.
I expect that your limit be exactly a power of 2 (or close if we use power of 10 units somewere). In that regard, I expect you to start having trouble immediatly above one of these: 256MB, 512 MB, 1GB, 2GB, etc.
When you hit that limit, it does not mean that you are out of memory, it simply means that it is not possible to allocate another buffer of twice the size of the buffer you already have. This observation opens room for improvement in your work: find the maximum size of buffer that you can allocate and reserve it upfront by calling the appropriate constructor
ByteArrayOutputStream bArrStream = new ByteArrayOutputStream(myMaxSize);
It has the advantage of reducing the overhead background memory allocation that happens under the hood to keep you happy. By doing this, you will be able to go to 1.5 the limit you have right now. This is simply because the last time the buffer was increased, it went from half the current size to the current size, and at some point you had both the current buffer and the old one together in memory. But you will not be able to go beyond 3 times the limit you are having now. The explanation is exactly the same.
That been said, I do not have any magic suggestion to solve the problem apart from process your data by chunks of given size, one chunk at a time. Another good approach will be to use the suggestion of Takahiko Kawasaki and use MappedByteBuffer
. Keep in mind that in any case you will need at least 10 GB of physical memory or swap memory to be able to load a file of 10GB.
see