5

I am trying to write a reader which reads files by bits but I have a problem with large files. I tried to read file with 100 mb and it took over 3 minutes but it worked.

However, then I tried file with 500 mb but it didn't even started. Because of this line:

byte[] fileBits = new byte[len];

Now I am searching for sulutions and can't find any. Maybe someone had solved it and could share some code, tips or idea.

if (file.length() > Integer.MAX_VALUE) {
    throw new IllegalArgumentException("File is too large: " + file.length());
}

int len = (int) file.length();
FileInputStream inputStream = new FileInputStream(file);

try {
    byte[] fileBits = new byte[len];
    for (int pos = 0; pos < len;) {
        int n = inputStream.read(fileBits, pos, len - pos);
        if (n < 0) {
            throw new EOFException();
        }
        pos += n;
    }

inputStream.read(fileBits, 0, inputStream.available());
inputStream.close();
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
Streetboy
  • 4,351
  • 12
  • 56
  • 101
  • 1
    Do you really need the entire file at once? What prevents you from streaming it? – josh.trow Feb 29 '12 at 19:02
  • 2
    Did you try [searching](https://www.google.com/search?q=OutOfMemoryError) for `OutOfMemoryError`? See `java -mx` – Miserable Variable Feb 29 '12 at 19:04
  • 2
    Each index you create is 4 bytes. 500 million of them will try to allocate 2G of space. You can't solve it this way without more memory or as josh.trow suggested buffer/streaming – John Vint Feb 29 '12 at 19:05
  • 1
    The solution that worked for us was to use a machine with 96GB RAM (with appropriate -XMx option) instead of 24GB when our app ran out of memory.. :D.. seriously, though answer what Josh said.. – Kashyap Feb 29 '12 at 19:06
  • @JohnVint how would creating a `byte` array use more than one byte per index? – mgibsonbr Feb 29 '12 at 19:12
  • The retained heap of an array of 1 element is 4 bytes (even if the index is not initialized). – John Vint Feb 29 '12 at 19:20
  • Of course, there's a fixed overhead for each array, but that doesn't mean every `byte` in the array will actually use 4 bytes, right? If so, what's the point of all the different integer types? (byte, short, int, long) – mgibsonbr Feb 29 '12 at 19:28
  • @mgibsonbr When I say it costs 4 bytes I only refer to the overhead incurred when creating an array, not its type. If he is running on a 32 bit windows OS and he simply tries to create an array of 500 million elements, it will initially try to reserve 2G of heap and fail doing that. – John Vint Feb 29 '12 at 19:34
  • Thank you guys it was quick for answers. Well i before tried to add more memory of VM in my netbeans VM Options typing -Xmx512m. But still this didn't helped. I think because i have just 1 GB of ram not 96 :D Just joke, I think i will try to use MappedByteBuffer to see what will be. Also as one answer was to use chunks of file reading. I thought it too but time how much it will take give me an idea to ask smarter people. I not just read these bits but convert them to chars and do some work with them :) – Streetboy Feb 29 '12 at 19:36

4 Answers4

7

I suggest you try memory mapping.

FileChannel fc = new FileInputStream(file).getChannel();
MappedByteBuffer mbb = fc.map(FileChannel.MapMode.READ_ONLY, 0, (int) fc.size());

This will make the whole file available almost immediately (about 10 ms) and uses next to no heap. BTW The file has to be less than 2 GB.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
2

If you really need to load the entire file to memory at once, I can only suggest increasing the memory available to Java. Try invoking your program using the Xmx (max heap size) or Xms (initial heap size) arguments (use the latter if you know beforehand how much memory you'll need, otherwise the former might be best).

java -Xms512m -Xmx1g BigApp

As an alternative, you can use NIO's Memory-mapped files.

mgibsonbr
  • 21,755
  • 7
  • 70
  • 112
1

You shouldn't open the whole file into memory. You need to create a byte array buffer with a fixed size, then you open your file from chunks of the size you defined.

Tiago Pasqualini
  • 821
  • 5
  • 12
1

I suggest you use a RandomAccessFile.

aioobe
  • 413,195
  • 112
  • 811
  • 826