Java - Using multiple threads to read/write to memory mapped buffers (MappedByteBuffer)

Question

I have an application where a lot of File I/O (both reads and writes) takes place. I know that using multiple threads to do File I/O is not a good solution as it can degrade the performance (I have no control over the kind of disk which is used). So I ended up dedicating one thread for all File I/O. Can MappedByteBuffer be of any use in my case? I know that MappedByteBuffer is a memory area which is mapped to a file by the OS, can I leverage multiple threads to do I/O operations on different memory mapped buffers efficiently? Does disk head seek times still matter when multiple threads are mapping different files to different memory buffers? Is consistency guaranteed in such cases? Are there any benchmark results available for such cases?Thank you all in advance.

"I know that using multiple threads to do File I/O is not a good solution as it can degrade the performance" What's that based on? — weston, Nov 05 '14 at 12:54
The disk's head needs to keep seeking the next position to read and when multiple threads do that the head will be bouncing between different disk areas inefficiently. — themanwhosoldtheworld, Nov 05 '14 at 13:08
http://stackoverflow.com/a/10397184/1878313 http://stackoverflow.com/a/1034860/1878313 are some relevant posts to support that multiple threads to do File I/O is not a good solution when one has no control over the type of disk used. — themanwhosoldtheworld, Nov 05 '14 at 13:11
Out of curiosity, the way you mentioned MemoryMappedBuffer makes it look like a specific class. However, searching the default JDK docs does not yield anything for MemoryMappedBuffer. What class/library are you planning to use in particular? — skytreader, Nov 05 '14 at 13:14
@skytreader thanks for pointing that out, I should have used 'MappedByteBuffer' in the description. — themanwhosoldtheworld, Nov 05 '14 at 13:20
I can probably find as many that say it is better, such as this http://stackoverflow.com/a/1239987/360211 The key is to take no ones word for it and profile. Personal experience has been that it improves overall throughput, it's not all about head seek times, there are other delays involved which are minimised when done in parallel. — weston, Nov 05 '14 at 14:11
The only thing one can say for sure is "it depends". Memory mapped files do have some advantages in some cases - they will cost you time in others. Also to some degree relevant: http://www.ibm.com/developerworks/java/library/j-zerocopy/ the part about less context switching should apply to memory mapped files as well. If you want a proper answer for your usecase, implement all the methods, profile & optimize until you have the best. — zapl, Nov 05 '14 at 15:30
@zapl thanks for the reply, the only option I see now is to profile and benchmark results myself and put them in public domain for the interested. — themanwhosoldtheworld, Nov 07 '14 at 11:15

score 4 · Answer 1 · answered Nov 10 '14 at 20:47

Can MappedByteBuffer be of any use in my case?

Refering to the JavaDoc a MappedByteBuffer should give you no performance advantages in contrast to a ByteBuffer. You could even end up with some unexpected changes during your runtime

The content of a mapped byte buffer can change at any time, for example if the content of the corresponding region of the mapped file is changed by this program or another.

I know that MappedByteBuffer is a memory area which is mapped to a file by the OS, can I leverage multiple threads to do I/O operations on different memory mapped buffers efficiently?

Except you know better than your OS or the VM how to read and write your data efficiently this is not the case.

Does disk head seek times still matter when multiple threads are mapping different files to different memory buffers?

The head still has to seek its position. Unless you have different disks and you do only disk IO it is useless to have more than one thread. In case you have some redundancy reading your data multithreading should be useful, because your OS will cache "hot" data.

Is consistency guaranteed in such cases?

Not really sure, what you mean, but you have to make sure accessing your ByteBuffer is somehow synchronized, because it is no threadsafe data structure.

Are there any benchmark results available for such cases?

Last year I did some sort of benchmarking, working with multiple buffers. Long story short, it really depends on the use case, the operating system and your hardware. Depending on how important this is I would recommend you do your own benchmarks. The only constant I remember is that you get the best performance writing data blocks of your disk segment size... which is somehow obvious ;-)

thanks for the answer, there is still lot of confusion on the topic of file system accessed from multiple threads, some conflicting answers here didn't help me either. As far as Memory mapped buffers are concerned , I would do some bench marking myself, it would be helpful if you can share your results as well. — themanwhosoldtheworld, Nov 12 '14 at 13:09

score 3 · Answer 2 · answered Nov 08 '14 at 12:29

So long as you're not attempting to have more than one thread write to the same file at a given time, there's no problem with doing file I/O from different threads. Using NIO, the FileSystem implementation is way better than you could ever hope to be at managing disk writes and resources. Disk writes are buffered and asynchronous by default in Java, so there's no need to do something as convoluted as making a single thread do all your I/O and writing into memory buffers - this is almost exactly what OutputStreams writing to disk do already, but the native JVM will do it more efficiently than you could.

In fact, file I/O operations can benefit substantially from multithreading. Different threads can be processing read information while other threads are reading, and it can even sometimes be faster to read or write a few files in parallel than sequentially.

"Different threads can be processing read information while other threads are reading" I was concerned about "different threads reading/writing to disk" performing badly than a single thread doing this. It would be helpful if you can share any bench marked results to support this. — themanwhosoldtheworld, Nov 12 '14 at 13:14

score 0 · Answer 3 · answered Jan 19 '16 at 03:05

If you're suggesting that you want to map separate regions of the same file to different MappedByteBuffers, and want to compare writing the file that way to single-threaded, blocking, unbuffered writes to the same file, I'm pretty sure that you'll be very happy with the results from a performance perspective.

You should remember that when writing to MemoryMappedBuffers, you are not necessarily writing to the disk when you request to perform a write. The OS is responsible for deciding which MemoryMappedBuffers correspond to RAM and when that RAM is written back to disk; typically that means that while writing, that file or portion of a file is kept in RAM, and the file is written back to disk at the discretion of the OS, which may mean it's kept in memory until it looks like you're done writing it, and then moved to disk, or that it's kept in RAM until the RAM it's taking up is needed for something else, unless you force() it to be written out to disk.

I think, from a performance perspective, it depends a lot on what your goal is: do you want your algorithm that does the writing to finish faster, in which case the memory mapped regions may well be a good option, as the algorithm can finish before the file finishes writing to disk, or do you want the file copied to the disk faster, in which case it's hard to say: if you are able to break up the file into nice large chunks that can be efficiently written to disk, and if the OS is able to recognize when you're done with a region and only writes each region back to disk once during the process, it may be more efficient.

On the other hand, if your current implementation is writing to disk very efficiently, i.e. if you are successfully arranging the writes to the file efficiently, such that there is little seeking necessary (if using hard disks), and the writes are buffered appropriately, so that you aren't forcing the OS to write small bits of the file all the way to disk before permitting it to have the next bit of the file, or writing bytes at random (which even solid state drives do not like, since they must write a certain sized region, and cannot write single bytes individually), then it's entirely possible that your current strategy would finish writing the file to disk faster -- assuming that getting the file onto the physical disk as fast as possible is the goal.

If you want to know how much room for improvement there is, you could compare your speed with the speed of a hard-drive performance test on your system, that should be able to benchmark the limit on your throughput to the disk; if that's significantly faster than your current implementation, either there's room for improvement in your writing strategy, or it's generating the data, rather than writing it, that's taking the time.

To test the latter, you could try having your algorithm write to ByteBuffers that are not memory mapped; with no file I/O, you can benchmark the speed of your algorithms independently of the disk.

Java - Using multiple threads to read/write to memory mapped buffers (MappedByteBuffer)

3 Answers3