19

When given a buffer of MAX_BUFFER_SIZE, and a file that far exceeds it, how can one:

  1. Read the file in blocks of MAX_BUFFER_SIZE?
  2. Do it as fast as possible

I tried using NIO

    RandomAccessFile aFile = new RandomAccessFile(fileName, "r");
    FileChannel inChannel = aFile.getChannel();

    ByteBuffer buffer = ByteBuffer.allocate(CAPARICY);

    int bytesRead = inChannel.read(buffer);

    buffer.flip();

        while (buffer.hasRemaining()) {
            buffer.get();
        }

        buffer.clear();
        bytesRead = inChannel.read(buffer);

    aFile.close();

And regular IO

    InputStream in = new FileInputStream(fileName);

    long length = fileName.length();

    if (length > Integer.MAX_VALUE) {
        throw new IOException("File is too large!");
    }

    byte[] bytes = new byte[(int) length];

    int offset = 0;

    int numRead = 0;

    while (offset < bytes.length
            && (numRead = in.read(bytes, offset, bytes.length - offset)) >= 0) {
        offset += numRead;
    }

    if (offset < bytes.length) {
        throw new IOException("Could not completely read file " + fileName);
    }

    in.close();

Turns out that regular IO is about 100 times faster in doing the same thing as NIO. Am i missing something? Is this expected? Is there a faster way to read the file in buffer chunks?

Ultimately i am working with a large file i don't have memory for to read it all at once. Instead, I'd like to read it incrementally in blocks that would then be used for processing.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
James Raitsev
  • 92,517
  • 154
  • 335
  • 470
  • NIO isn't necessarily faster, it's just different. If `java.io` is faster for you, then ignore NIO. – skaffman Jan 28 '12 at 16:29
  • NIO w/o direct ByteBuffers is useless (or at least transferTo on Linux, on Windows it's emulated, hence useless) – bestsss Jan 29 '12 at 13:10
  • @skaffman, NIO is (strictly) faster when used properly, it does avoid buffer copying compared to regular the IO. It's not very easy to use for rookies, though. – bestsss Jan 29 '12 at 13:17
  • 5
    I would only point out that your two impls are doing different things. Your NIO code in the example is reading bytes into your ByteBuffer, then you are reading them *again*, one-by-one, from the backing byte[] in the ByteBuffer and doing nothing with them in the while-loop. In the IO code you are reading the bytes into the byte[] and doing no other work. Your NIO code is doing 2x the reads plus the billions of calls to get() to grab individual byte values. – Riyad Kalla Feb 13 '12 at 16:27

2 Answers2

25

If you want to make your first example faster

FileChannel inChannel = new FileInputStream(fileName).getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect(CAPACITY);

while(inChannel.read(buffer) > 0)
    buffer.clear(); // do something with the data and clear/compact it.

inChannel.close();

If you want it to be even faster.

FileChannel inChannel = new RandomAccessFile(fileName, "r").getChannel();
MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
// access the buffer as you wish.
inChannel.close();

This can take 10 - 20 micro-seconds for files up to 2 GB in size.

bestsss
  • 11,796
  • 3
  • 53
  • 63
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 2
    Don't forget to close RandomAccessFile as it is a resource leak. – crush Sep 01 '13 at 22:15
  • 1
    @crush True, closing the file channel closes the random access file – Peter Lawrey Sep 02 '13 at 07:13
  • How can I read a full line with `MappedByteBuffer`, instead of reading char by char? see http://howtodoinjava.com/2013/05/01/3-ways-to-read-files-using-java-nio/ – Lucas Jota Jan 15 '14 at 01:33
  • 3
    just looked the source code and found that closing the file channel doesn't close the random access file, but the other way around, so be sure to close the RandomAccessFile :) – Kin Cheung Mar 28 '17 at 03:18
20

Assuming that you need to read the entire file into memory at once (as you're currently doing), neither reading smaller chunks nor NIO are going to help you here.

In fact, you'd probably be best reading larger chunks - which your regular IO code is automatically doing for you.

Your NIO code is currently slower, because you're only reading one byte at a time (using buffer.get();).

If you want to process in chunks - for example, transferring between streams - here is a standard way of doing it without NIO:

InputStream is = ...;
OutputStream os = ...;

byte buffer[] = new byte[1024];
int read;
while((read = is.read(buffer)) != -1){
    os.write(buffer, 0, read);
}

This uses a buffer size of only 1 KB, but can transfer an unlimited amount of data.

(If you extend your answer with details of what you're actually looking to do at a functional level, I could further improve this to a better answer.)

ziesemer
  • 27,712
  • 8
  • 86
  • 94
  • Processing data in chunks for it to later be transferred between streams is exactly what i am to use this code for. To do something like this, would you even bother with nio? – James Raitsev Jan 28 '12 at 16:46
  • @JAM - No, not unless working with other API that already made use of NIO for the proper features - e.g. if working with many concurrent files, and needing to avoid significant multi-threading. – ziesemer Jan 28 '12 at 16:48
  • Finally, can you recommend how one can read `n` bytes at a time using nio? Just wondering – James Raitsev Jan 28 '12 at 16:52
  • @JAM - Similar to my non-NIO example in my answer, in your NIO example, at `inChannel.read(buffer);` - just use a buffer of an appropriate size. You're not looking to read the entire file, only a chunk. Just be aware that this is a non-blocking call, so you may get less than the number of bytes than you asked for. – ziesemer Jan 28 '12 at 16:57