-1

I am getting a very large file(>2.5GB) data in a ByteArrayInputStream format from other method. This data I have to pass to another method in a InputStream format. I have written the following code which executes fine for smaller file, but it fails for large file of more than 2GB of size.

ByteArrayInputStream bais = null;
bais = method_Returns_FIle_In_ByteArrayInputStream_Format();
InputStream is = bais;
method_Where_To_send_Data_In_InputStream_Format(is);

But my code is breaking in the second line itself, giving following error:

java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3236)
    at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
    at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
    at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)

Already tried increasing the Java Heap Space size (both -Xms and -Xmx).

Any suggestion is appreciated.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
suchit
  • 57
  • 9
  • 1
    Why do you want to load the whole file into memory? Is it an option to just use a `FileInputStream`? – Progman Aug 22 '21 at 16:35
  • 4
    Find a solution so you don't have to load everything in memory before you can process it (e.g. use a different type of input stream that streams from disk, or network). – Mark Rotteveel Aug 22 '21 at 16:36
  • @Progman I don't see how can I use a FileInputStream over here? New to Java currently. – suchit Aug 22 '21 at 16:41
  • @MarkRotteveel I will try to find example related to those. – suchit Aug 22 '21 at 16:42
  • @suchit A `FileInputStream` gives you access to a file without reading it completely into memory. And since it extends from `InputStream`, you can use it in your `method_Where_To_send_Data_In_InputStream_Format()` method. – Progman Aug 22 '21 at 16:47
  • @Progman I will give it a try and will let you know if that worked. Thanks anyway!! – suchit Aug 22 '21 at 16:49
  • The max size of the underlying arrays will be `new byte[Integer.MAX_VALUE - 2]` so you can give up trying with 2.5G in memory with ByteArrayOutputStream. See also [https://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size/8381338] – DuncG Aug 22 '21 at 17:06
  • @DuncG Can you please point me to any example ? Thanks anyway – suchit Aug 22 '21 at 17:08

2 Answers2

3

I am getting a very large file(>2.5GB) data in a ByteArrayInputStream format from other method.

The war was lost in the other method. If you can't change 'the other method', you're out of luck. There is absolutely nothing you can do here. ByteArrayInputStream is by definition an entirely in-memory affair, and if 2.5GB worth of data comprises the total contents of that stream, that that BAIS takes at least 2.5GB worth of memory. Nothing you can do about it.

The fix is to go to that method and fix it. It has absolutely no business sending that in BAIS form. The 'point' of InputStream is in the name: It's to stream that data.

If you can't change it, and -Xmx8g on a 64-bit VM doesn't fix it, there is nothing left to do.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
1

You should not try to load the entire file and then read it from memory.

try (InputStream is = method_Returns_FIle_In_InputStream_Format()) {
    method_Where_To_send_Data_In_InputStream_Format(is);
}

The above uses try-with-resources which automatically closes is, even on exception or break or return.

The file reading method could do

Path path = Paths.get("phantasies.log");
return Files.newInputStream(path);

When wanting to do some additional processing on the file input, one could wrap the InputStream in your custom child of FilterInputStream.

This also shows it is better to program against an interface (InputStream) rather than an actual implementation (ByteArrayInputStream).

Should the ByteArrayInputStream stems from collecting an OutputStream, one should use java's piped I/O: PipedInputStream & PipedOutputStream - with an extra Thread.

Alternatively compression could be used to reduce the size in memory, a GZipInputStream/GZipOutputStream.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138