5

I'm working on some Java code that will eventually be used within an app server to access some really big files (over 1GB, under 20GB), possibly hosted on an NFS share. Servicing an individual request will involve doing this:

  1. Find the large file I need to read
  2. Navigate to a random point in that file
  3. Read bytes from that file (usually under 1MB)
  4. Return those bytes

I have some happy simple POC code at the moment that simply opens a new read-only file and the closes it:

RandomAccessFile raf=new RandomAccessFile(myFileName, "r");
try{
   byte[] buffer = new byte[size];
   raf.seek(position);
   raf.reafFully(buffer);
   return buffer;
}
finally{
   raf.close();
}

I'm wondering if this is an elegantly simple approach that should work really well, or a foolishly simplistic approach which will have a lot of problems under heavy load (and perhaps I need to make a thread-safe pool of readers, etc). Obviously testing that assumption would be best, but I was wondering if there were any best practices or known issues with either approach. So far I haven't been able to figure out very much googling...

Thanks!

PS. It's not clear yet whether the final version of this would be hosted on Windows or *nix. It's also not clear how the large files would be shared. PPS. The app servers are likely to be configured in a cluster, so two different app servers might need to read the same large shared file at the same time.

Dave
  • 1,036
  • 9
  • 10
  • 1
    looks fine to me. you can't get any faster than that, unless you cache the file on local disk or in memory – irreputable Nov 29 '12 at 15:30
  • So the cost of opening and releasing file handles is negligible? Even across, say, an NFS share? – Dave Nov 29 '12 at 17:08
  • that's probably not negligible, even on local files. if it's a concern, you can keep a pool of handles. or, keep 1 `FileChannel` open, read it concurrently by `read(dst,position)` – irreputable Nov 29 '12 at 20:18

1 Answers1

2

Another option is java NIO, namely FileChannel. FileChannel is also navigable and it may be faster than RandomAccessFile because it can work with so called direct buffers. It's got some more interesting features, eg it is interruptible.

Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275
  • Good call. Yeah, I've tested with those. It does seem that it is negligibly faster, but not enough faster to necessarily warrant the complexity in *this* particular use case. I actually got burned by nio recently due to a physical windows memory leak in the JVM on another app, so I've been a bit hesitant to use it since then. Honestly, if the Random Access approach performs under load as well as it does on single threaded tests, it's perfect for me. – Dave Nov 29 '12 at 17:13
  • Right, still check this if haven't yet http://stackoverflow.com/questions/1605332/java-nio-filechannel-versus-fileoutputstream-performance-usefulness – Evgeniy Dorofeev Nov 29 '12 at 17:35