Write IO breakups on linux?

Question

My application is using O_DIRECT for flushing 2MB worth of data directly to a 3-way-stripe storage (mounted as an lvm volume)..

I am getting a very pathetic write speed on this storage. The iostat shows that the large request size is being broken into smaller ones.

avgrq-sz is <20... There aren't much read on that drive.

It takes around 2 seconds to flush down 2MB worth of contiguous memory blocks (using mlock to assure that), sector aligned (using posix_memalign), whereas tests with dd and iozone rate the storage capable of > 20Mbps of write speed.

I would appreciate any clues on how to investigate this issue further.

PS: If this is not the right forum for this query, I would appreciate indicators to a one that could be helpful.

Thanks.

superuser.stackexchange.com; welcome to SO; I reformatted the question. It helps to formulate and present the question well — sehe, Apr 22 '11 at 21:44
What is the filesystem driver? (is it fuse? what version?) 20 Mbps in 3way stripe... Is it writing in pencil? What kind of media are we talking about - that doesn't seem like sata HDDs — sehe, Apr 22 '11 at 21:49
I disagree with the folks who are voting to close this question. The guy's got some C code and he wants to improve the write performance. Superuser? Please. Power users don't call `posix_memalign` or use `O_DIRECT`, this is very clearly a programming question. — asveikau, Apr 23 '11 at 01:34
. @sehe Its one of the latest Ubuntu Server destro with ext4, the hdd is indeed SATA. I will post the exact stats on Monday. @asveikau yes, its a C code which needs to do lots of sequential writes to the drive. If any particular stats information would be helpful in debugging this please let me know I'll put those stats here — Hemanshu, Apr 23 '11 at 16:27

score 0 · Answer 1 · answered Sep 28 '19 at 07:40

Write IO breakups on linux?

The disk itself may have a maximum request size, there is a tradeoff being block size and latency (the bigger the request being sent to the disk the longer it will likely take to to be consumed) and there can be constraints on how much vectored I/O a driver can consume in a single request. Given all the above, the kernel is going to "break up" single requests that are too large when submitting further down the stack.

I would appreciate any clues on how to investigate this issue further.

Unfortunately it's hard to say why the avgrq-sz is so small (if its in sectors that about 10KBytes per I/O) without seeing the code actually submitting the I/O (maybe your program is submitting 10KByte buffers?). We also don't know if iozone and dd were using O_DIRECT during the questioners test. If they weren't then their I/O would have been going into the write back cache and then streamed out later and the kernel can do that in a more optimal fashion.

Note: Using O_DIRECT is NOT a go faster stripe. In the right circumstances O_DIRECT can lower overhead BUT writing O_DIRECTly to do disk increases the pressure on you to submit I/O in parallel (e.g. via AIO/io_uring or via multiple processes/threads) if you want to reach the highest possible throughput because you have robbed the kernel of its best way of creating parallel submission to the device for you.

Write IO breakups on linux?

1 Answers1