17

what's the probably fastest way of reading relatively huge files with Java's I/O-methods? My current solution uses the BufferedInputStream saving to an byte-array with 1024 bytes allocated to it. Each buffer is than saved in an ArrayList for later use. The whole process is called via a separate thread (callable-interface).

Not very fast though.

    ArrayList<byte[]> outputArr = new ArrayList<byte[]>();      
    try {
        BufferedInputStream reader = new BufferedInputStream(new FileInputStream (dir+filename));

        byte[] buffer = new byte[LIMIT]; // == 1024 
            int i = 0;
            while (reader.available() != 0) {
                reader.read(buffer);
                i++;
                if (i <= LIMIT){
                    outputArr.add(buffer);
                    i = 0;
                    buffer = null;
                    buffer = new byte[LIMIT];
                }
                else continue;              
            }

         System.out.println("FileReader-Elements: "+outputArr.size()+" w. "+buffer.length+" byte each.");   
skaffman
  • 398,947
  • 96
  • 818
  • 769
chollinger
  • 1,097
  • 5
  • 20
  • 34
  • Have a look at the Apache Commons libraries for more options. And for determining the speed have a look at the Java Performance Tuning book by O'Reilly. – therobyouknow Feb 01 '12 at 10:03
  • 6
    Currently you're ignoring the value returned by your `read()` call. *Don't do that.* – Jon Skeet Feb 01 '12 at 10:06

3 Answers3

49

I would use a memory mapped file which is fast enough to do in the same thread.

final FileChannel channel = new FileInputStream(fileName).getChannel();
MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());

// when finished
channel.close();

This assumes the file is smaller than 2 GB and will take 10 milli-seconds or less.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 1
    Bloody hell! Why the heck is that thing so extremely fast? Thanks anyways, works perfectly. (edit: it gets the file from the memory, the java docs just told me. clever) – chollinger Feb 01 '12 at 15:42
  • 2
    If you need to access more than 2 GB you need to use more than one mapping. – Peter Lawrey Feb 01 '12 at 21:42
  • @PeterLawrey on the same lines is there an efficient way to convert a large input stream into a byte array? For instance, reading an input stream from a ContainerRequestContext? – Arnav Sengupta Jul 08 '20 at 04:53
4

Don't use available(): it's not reliable. And don't ignore the result of the read() method: it tells you how many bytes were actually read. And if you want to read everything in memory, use a ByteArrayOutputStream rather than using a List<byte[]>:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
int read;
while ((read = reader.read(buffer)) >= 0) {
    baos.write(buffer, 0, read);
}
byte[] everything = baos.toByteArray();

I think 1024 is a bit small as a buffer size. I would use a larger buffer (something like 16 KB or 32KB)

Note that Apache commons IO and Guava have utility methods that do this for you, and have been optimized already.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
1

Have a look at Java NIO (Non-Blocking Input/Output) API. Also, this question might prove being useful.

I don't have much experience with IO, but I've heard that NIO is much more efficient way of handling large sets of data.

Community
  • 1
  • 1
ŁukaszBachman
  • 33,595
  • 11
  • 64
  • 74