2

This does not look trivial, specially for a read/write buffered FileChannel. Is there anything opensource implemented somewhere that I can base my implementation on?


To be clear for those who did not understand:

FileChannel does buffereing in the OS level and I want to do buffering in the Java level. Read here to understand: FileChannel#force and buffering


@Peter I want to write a huge file to disk from a fast message stream. Buffering and batching are the way to go. So I want to batch in Java and then call FileChannel.write.

Community
  • 1
  • 1
chrisapotek
  • 6,007
  • 14
  • 51
  • 85

3 Answers3

5

I recommend using a BufferedOutputStream wrapping a FileOutputStream. I do not believe you will see any performance improvement by mucking with ByteBuffer and FileChannel, and that you'll be left with a lot of hard-to-maintain code if you go that route.

The reasoning is quite simple: regardless of the approach you take, the steps involved are the same:

  1. Generate bytes. You don't say how you plan to do this, and it could introduce an additional level of temporary buffering into the equation. But regardless, the Java data has to be turned into bytes.
  2. Accumulate bytes into a buffer. You want to buffer your data before writing it, so that you're not making lots of small writes. That's a given. But where that buffer lives is immaterial.
  3. Move bytes from Java heap to C heap, across JNI barrier. Writing a file is a native operation, and it doesn't read directly from the Java heap. So whether you buffer on the Java heap and then move the buffered bytes, or buffer in a direct ByteBuffer (and yes, you want a direct buffer), you're still moving the bytes. You will make more JNI calls with the ByteBuffer, but that's a marginal cost.
  4. Invoke fwrite, a kernel call that copies bytes from the C heap into a kernel-maintained disk buffer.
  5. Write the kernel buffer to disk. This will outweigh all the other steps combined, because disks are slow.

There may be a few microseconds gained or lost depending on exactly how you implement these steps, but you can't change the basic steps.

The FileChannel does give you the option to call force(), to ensure that step #5 actually happens. This is likely to actually decrease your overall performance, as the underlying fsync call will not return until the bytes are written. And if you really want to do it, you can always get the channel from the underlying stream.

Bottom line: I'm willing to bet that you're actually IO-bound, and there's no cure for that save better hardware.

parsifal
  • 301
  • 1
  • 2
  • Are both FileOutputStream and FileChannel non-blocking? I guess so... So the only advantage of FileChannel is that it can take a ByteBuffer directly. And it has force() which we don't care. But as people have mentioned before, FileChannel is faster then FileOutputStream: http://stackoverflow.com/questions/1605332/java-nio-filechannel-versus-fileoutputstream-performance-usefulness – chrisapotek Sep 25 '12 at 18:45
  • I take all such posts with an extremely large grain of salt. *Stu Thompson* gave a lot of good pointers on how to tune an IO-heavy application, but his benchmark results may (will) have very little to do with your application. – parsifal Sep 25 '12 at 19:11
  • I think the point i forgot to emphasize is that i will *also need to read* at random points of the big file. So some kind of swapping will be needed. – chrisapotek Sep 25 '12 at 19:14
  • 1
    @chrisapotek - I think you will be far better off describing the actual needs of our application -- *all of them* -- and asking for implementation suggestions. As it is, you've just given little bits and pieces of your implementation, and asked how to add another solution that you've already decided upon. – parsifal Sep 25 '12 at 19:23
  • @chrisapotek *Neither* of them is non-blocking. The only non-blocking channels in Java are those that extend `SelectableChannel.` – user207421 Sep 28 '12 at 00:56
4

FileChannel only works with ByteBuffers so it is naturally buffered. If you need additional buffering to can copy data from ByteBuffer to ByteBuffer but I am not sure why you would want to.

FileChannel does buffereing in the OS level

FileChannel does tell the OS what to do. The OS usually has a disk cache but FileChannel has no idea whether this is the case or not.

I want to do buffering in the Java level

You are in luck, because you don't have a choice. ;) This is the only option.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • I want to write a huge file to disk from a fast message stream. Buffering and batching are the way to go. So I want to batch in Java and then call FileChannel.write. – chrisapotek Sep 25 '12 at 16:33
  • If you use NIO, you don't have any other options so I am not sure what you a looking for. – Peter Lawrey Sep 25 '12 at 16:36
  • Let's say I have to write 1000 messages that came in a batch. I don't want to call FileChannel.write 1000 times. I want to buffer this in Java and them call FileChannel.write just once. I guess I want a batching FileChannel. – chrisapotek Sep 25 '12 at 16:37
  • That seems fine although you only need to write in batches of 16-64KB to get the best bandwidth to disk. Larger batches can harm performance. – Peter Lawrey Sep 25 '12 at 16:39
  • So even if I have a batch greater than 64kb I should call FileChannel.write in multiple chunks of 16k-64kb? – chrisapotek Sep 25 '12 at 16:41
  • 1
    That is what I would do. If you don't need to write to disk regularly you can use a single ByteBuffer of 64KB and when the arriving message would overflow this you can write it to disk and clear the buffer. – Peter Lawrey Sep 25 '12 at 16:42
  • BTW If you use memory mapped files, you can avoid calling write altogether, but this is only useful if you want every message to be visible immediately and don't want to risk losing a message if the program crashes. (And you have a 64-bit JVM) – Peter Lawrey Sep 25 '12 at 16:44
  • Tried that, Peter. It works great and fast but has a MAJOR drawback. If you ever has to expand the memory mapped byte buffer you hit a major bottleneck because you have to call force and re-map to a bigger file. And eventually you will run out of RAM space to map the file. I guess I would have to implement something similar to a virtual memory with page swapping... – chrisapotek Sep 25 '12 at 16:47
  • 1
    I have a library `Java Chronicle` which only adds more memory mapped ByteBuffers in blocks of 16 MB to 1 GB. It doesn't use force or remap the file, it just keeps adding to the end. If you have a 64-bit JVM you can write over 8 TB this way. – Peter Lawrey Sep 25 '12 at 16:52
  • I will take a look. You only lose data if the JVM crashes, right? Calling force kills performance, right? I like to add a shutdown hook with a force, but I don't even think it is necessary. The JVM probably has this shutdown hook already hidden in there. – chrisapotek Sep 25 '12 at 16:57
  • If you just use FileChannel, you only lose the data you didn't write. Force can kill performance so only use it if you need to. e.g. if you can't loose data if the OS to crashes. You will need a shutdown hook to perform the write, but not the force. – Peter Lawrey Sep 25 '12 at 17:01
  • last question: Is there a difference in using a single FileChannel instead of a bunch of MemoryMapped byte buffers? I am wondering if the FileChannel is smart enough to PAGE and SWAP the same way you are doing with a sequence of memory byte buffers. – chrisapotek Sep 27 '12 at 05:37
  • A FileChannel copies the data to/from the ByteBuffer provided. It doesn't do any paging or swapping. The OS's disk cache does page in and out as normal however. When you use memory mapped buffers you are using the OS's disk cache directly without any additional reading/writing. – Peter Lawrey Sep 27 '12 at 07:01
  • This is not yet very clear to me, Peter so I opened up a new question about it: http://stackoverflow.com/questions/12634163/is-there-a-performance-advantage-in-writing-a-long-file-sequentially-using-mappe – chrisapotek Sep 28 '12 at 05:37
-2

I would have two threads, the producer thread produces ByteBuffers and appends them to the tail a queue, the consumer thread remove some ByteBuffers from the head of the queue each time, and call fileChannel.write(ByteBuffer[]).

irreputable
  • 44,725
  • 9
  • 65
  • 93