7

Background:

I'm developing a database related program, and I need to flush dirty metadata from memory to disk sequentially. /dev/sda1 is volumn format, so data on /dev/sda1 will be accessed block by block and the blocks are adjacent physically if accessed sequentially. And I use direct I/O, so the I/O will bypass the caching mechanism of the file system and access directly the blocks on the disk.

Problems:

After opening /dev/sda1, I'll read one block, update the block and write the block back to the same offset from the beginning of /dev/sda1, iteratively.

The code are like below -

//block_size = 256KB
int file = open("/dev/sda1", O_RDWR|O_LARGEFILE|O_DIRECT);
for(int i=0; i<N; i++) {
    pread(file, buffer, block_size, i*block_size);
    // Update the buffer
    pwrite(file, buffer, block_size, i*block_size);
}

I found that if I don't do pwrite, read throughput is 125 MB/s.

If I do pwrite, read throughput will be 21 MB/s, and write throughput is 169 MB/s.

If I do pread after pwrite, write throughput is 115 MB/s, and read throughput is 208 MB/s.

I also tried read()/write() and aio_read()/aio_write(), but the problem remains. I don't know why write after read at the same position of a file will make the read throughput so low.

If accessing more blocks at a time, like this

pread(file, buffer, num_blocks * block_size, i*block_size);

The problem will mitigate, please see the chart.

Chia
  • 102
  • 6
  • 1
    What's your block size? There's a good chance you're seeing the effects of hardware caching and read-ahead on the disk(s) you're accessing. The `pwrite()` fills the cache, and if the next `pread()` is for different data, none of it is cached. Doing the `pread()` after the `pwrite()` allows data to be read directly from the disk's hardware cache. – Andrew Henle Sep 23 '15 at 09:49
  • I don't know the physical block size, and I set to 256KB in the program. Thanks for your comment, now I think it's very likely caused by disk's buffer. – Chia Sep 23 '15 at 15:46

1 Answers1

3

And I use direct I/O, so the I/O will bypass the caching mechanism of the file system and access directly the blocks on the disk.

If you don't have file system on the device and directly using the device to read/write, then there is no file system cache comes into the picture.

The behavior you observed is typical of disk access and IO behavior.

I found that if I don't do pwrite, read throughput is 125 MB/s

Reason: The disk just reads data, it doesn't have to go back to the offset and write data, 1 less operation.

If I do pwrite, read throughput will be 21 MB/s, and write throughput is 169 MB/s.

Reason: Your disk might have better write speed, probably disk buffer is caching write rather than directly hitting the media.

If I do pread after pwrite, write throughput is 115 MB/s, and read throughput is 208 MB/s.

Reason: Most likely data written is being cached at disk level and so read gets data from cache instead of media.

To get optimal performance, you should use asynchronous IOs and number of blocks at a time. However, you have to use reasonable number of blocks and can't use very large number. Should find out what is optimal by trial and error.

Rohan
  • 52,392
  • 12
  • 90
  • 87
  • Thanks for your answer, now I think it's very likely caused by disk's buffer. But I still can't imagine just seek to the previous position will let read throughput drop from 125 MB/s to 21 MB/s... – Chia Sep 23 '15 at 15:47
  • @leo, Yes, seeks are expensive. Look at IO wait times which would increase when throughput decreases. – Rohan Sep 23 '15 at 16:31